You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Askar Bozcan <as...@gmail.com> on 2023/05/03 21:34:30 UTC

Re: Template SQL

Hey, sorry for the late reply but I had some stuff to do IRL & some
deadlines I need to meet right now. I'll try to answer you as soon as I
can, but this week doesn't seem possible.

Keep the questions coming, however, if you haven't solved them yourself
(and if you did, please share your solutions). I'm planning to write a
sort-of "Comprehensive Calcite 101" documentation in the near future which
I am going to be basing on my own experiences & questions in the community.

Kind regards,
Askar

On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <so...@gmail.com>
wrote:

> Hello Askar,
>
> Thank you so much for guiding me, I have taken some inspiration from your
> code and from Dremio's implementation and compiled this -
> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
> .
>
> I have a few follow-up questions, if I may, regarding what I have
> implemented:
>
> - is it possible to implement this as a rule, probably a RelRule (I am
> saying a RelRule because as far as I have understood RelRule is the way you
> implement rules that you would like to enforce before the physical plan is
> created)?
> *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
> present, if not then add it in the query (probably transform it).*
> *research done so far* : I probably need to create a RelRule and implement
> the "matches" and "onMatch" methods. The "onMatch" method should transform
> my rel or rex node using transformTo, probably.
> Something like
>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
> is
> something which I might need to implement.
> But a few methods are hard to grasp for me and they are:
>
>  - transformTo
>  - RelRule.Config.withOperandFor()
>
> Could you please give me some insight on how I can understand their
> structures?
>
> - is it possible to find whether a node (sql/rel) is GROUP BY or not?
> - if there is an AND clause in WHERE, is there a way to parse all the
> filter conditions separately? (using RexCall.getOperator did not work)
> - is this (what I have done) only a hack or is there a better way to
> implement this?
>
> I understand that my questions are probably not trivial, so I appreciate
> that you are reading my emails and providing your valuable feedback.
>
> Thank you for your time and consideration. :)
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
>
>
> On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <as...@gmail.com>
> wrote:
>
> > Thank you for your kind words :)
> >
> > - Askar
> >
> > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com
> > >
> > wrote:
> >
> > > Hello Askar,
> > >
> > > Thank you so much for taking the time to compile this for me.
> > >
> > > I shall be indebted to you for this effort. Please let me know if I can
> > be
> > > of any assistance ever to repay your kindness.
> > >
> > > I appreciate all your help and guidance. Thank you once again. I shall
> go
> > > through the same and will get back to you as soon as possible. Thank
> you
> > > again sir! :)
> > >
> > > With best regards,
> > > Soumyadeep Mukhopadhyay.
> > >
> > >
> > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <as...@gmail.com>
> > > wrote:
> > >
> > > > Hello again!
> > > > I have prepared this gist to perhaps help you understand how to
> > traverse
> > > > through a Sql parse tree:
> > > > https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> > > >
> > > > I have implemented a Filterer class that returns false on any queries
> > > which
> > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
> > > queries.
> > > > I have used Kotlin, so if you have difficulties understanding it (as
> > > you're
> > > > using Scala), check out Kotlin's smart cast and nullability.
> > > >
> > > > Good luck and hopefully that was helpful,
> > > > Askar Bozcan
> > > >
> > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> > > > soumyamyb95@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Askar,
> > > > >
> > > > > I have gone through your email a couple of times. I am trying to
> > > > > understand it bit by bit.
> > > > >
> > > > > I have a request, please feel free to say no, is there any code
> base
> > > > where
> > > > > I can refer and understand how to implement my own filter rule?
> > > > > What I realised I asked wrong is I wanted to implement "(HAVING
> > > COUNT(*)
> > > > > > 100)" as a global rule, if there are any aggregation queries.
> > > > >
> > > > > It is a bit difficult for me to understand where I should start.
> > > > > For example, if I follow this
> > > > https://github.com/zabetak/calcite-tutorial where
> > > > > shall I start looking at?
> > > > > In the below image I feel like some kind of comparison is going on,
> > > > should
> > > > > I devise my code like this? (in order to implement something like
> > > > "(HAVING
> > > > > COUNT(*) > 100)")
> > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> > > > >
> > > > > Also is there any documentation I can go through regarding how I
> can
> > > > > traverse through the AST?
> > > > > Probably I am not looking in the right places but so far I could
> only
> > > go
> > > > > through Tab9 code examples (or the documentation provided by the
> > > Calcite
> > > > > website) and things did not seem to be clear.
> > > > > I realise it is probably a lot to ask, so whatever you share will
> be
> > a
> > > > lot
> > > > > of help for me.
> > > > >
> > > > > Thanks again for your time, patience and help!
> > > > >
> > > > > With regards,
> > > > > Soumyadeep Mukhopadhyay.
> > > > >
> > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <askar.mulin@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> Hey,
> > > > >> You can use Planner.reset(). Note that it has mutable state
> inside,
> > so
> > > > do
> > > > >> not reuse the same planner instance in any multi-threading
> > > environment.
> > > > >>
> > > > >> (I am assuming you have access to table metadata so that you will
> be
> > > > able
> > > > >> to convert your SqlNode tree into RelNode tree, relational
> > expression
> > > > >> tree)
> > > > >> - Only return results above a certain threshold when using GROUP
> BY,
> > > for
> > > > >> example (HAVING COUNT(col1) > 100).
> > > > >>
> > > > >> I'm not quite sure I understand this question is HAVING part of
> the
> > > > query?
> > > > >>
> > > > >> - Restrict the column on which joins can happen, or else throw an
> > > error
> > > > >> (almost like analysisException in Spark)
> > > > >>
> > > > >> Do you have access to table/schema metadata?
> > > > >> If you do:
> > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
> > relational
> > > > >> tree
> > > > >> (RelNode).
> > > > >> Watch this tutorial by Stamatis:
> > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> > > > >> It will explain way better the usage than I can in an email. (And
> it
> > > > what
> > > > >> introduced me to Calcite's basics :))
> > > > >> 2) Traverse your relational tree by implementing RelShuttle
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> > > > >> >.
> > > > >> (Look up Visitor pattern to better understand how this recursive
> > > > traversal
> > > > >> works)
> > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> > > > >> 4) When you encounter LogicalJoin, traverse its children with
> > getLeft
> > > > and
> > > > >> getRight
> > > > >> 5) When you encounter LogicalTableScan,  You can get its
> RelOptTable
> > > > with
> > > > >> getTable, and then RelOptTable.getRowType() to find the fields.
> > > > >> (Not 100% about step 5, but should be close to it)
> > > > >>
> > > > >>
> > > > >> - Restrict the columns that can be passed in a select statement or
> > > else
> > > > >> throw an error (like in the point above)
> > > > >> Same logic as above. But instead of TableScan, look at the root
> > > > >> LogicalProject.
> > > > >>
> > > > >> All of the above should be doable with just the parse tree
> (SqlNode)
> > > and
> > > > >> without access to table metadata using SqlShuttle; however, it's
> > > easier
> > > > >> and
> > > > >> less error-prone with relational tree IMO.
> > > > >>
> > > > >> - Askar
> > > > >>
> > > > >>
> > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> > > > >> soumyamyb95@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hey Askar,
> > > > >> >
> > > > >> > It worked exactly how you suggested.
> > > > >> >
> > > > >> > ```
> > > > >> >
> > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
> > > table_name
> > > > >> > WHERE predicate_column = 'predicate_value'"
> > > > >> > private val frameworkConfig: FrameworkConfig =
> > > > >> > Frameworks.newConfigBuilder.build()
> > > > >> > private val planner: Planner =
> > > Frameworks.getPlanner(frameworkConfig)
> > > > >> > private val planner2: Planner =
> > > Frameworks.getPlanner(frameworkConfig)
> > > > >> > private val planner3: Planner =
> > > Frameworks.getPlanner(frameworkConfig)
> > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> > > > >> > println(sqlNode.getKind)
> > > > >> > private val sqlSelectStmt: SqlSelect =
> > > sqlNode.asInstanceOf[SqlSelect]
> > > > >> >
> > > > >> > private val setSelectColumnsQuery = "SELECT age"
> > > > >> > private val selectList =
> > > > >> >
> > > > >>
> > > >
> > >
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > > > >> > private val setFromTableQuery = "SELECT employee"
> > > > >> > private val fromTable =
> > > > >> >
> > > >
> planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> > > > >> >
> > > > >> > sqlSelectStmt.setSelectList(selectList)
> > > > >> > sqlSelectStmt.setFrom(fromTable)
> > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
> > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> > > > >> >
> > > > >> > ```
> > > > >> >
> > > > >> > The only caveat I see is the need for a new planner for every
> new
> > > > query.
> > > > >> > Should I do something else or is this expected?
> > > > >> >
> > > > >> > On a different note, I wanted to ask about how I can write my
> own
> > > > >> rules. Is
> > > > >> > it possible to enforce following rules:
> > > > >> > - Only return results above a certain threshold when using GROUP
> > BY,
> > > > for
> > > > >> > example (HAVING COUNT(col1) > 100).
> > > > >> > - Restrict the column on which joins can happen, or else throw
> an
> > > > error
> > > > >> > (almost like analysisException in Spark)
> > > > >> > - Restrict the columns that can be passed in a select statement
> or
> > > > else
> > > > >> > throw an error (like in the point above)
> > > > >> >
> > > > >> > Not sure if it is feasible, please feel free to suggest. :)
> Thanks
> > > > again
> > > > >> > for your time!
> > > > >> >
> > > > >> > With regards,
> > > > >> > Soumyadeep Mukhopadhyay.
> > > > >> >
> > > > >> >
> > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
> askar.mulin@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hey Soumyadeep,
> > > > >> > > I think that can work with a few caveats.
> > > > >> > > 0) Use the Planner from Frameworks
> > > > >> > > <
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > > > >> > > >
> > > > >> > > 1) Parse the “template query" into a syntax tree (a root
> > SqlNode)
> > > > but
> > > > >> do
> > > > >> > > not validate it
> > > > >> > > 2) Cast the root SqlNode into a SqlSelect
> > > > >> > > <
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > > > >> > > >
> > > > >> > > (can
> > > > >> > > verify that it's a SELECT by using getKind())
> > > > >> > > 3) Use the setters from SqlSelect to modify the children of
> root
> > > > >> > SqlSelect
> > > > >> > > node.
> > > > >> > > 4) Unparse the root query back into a string
> > (SqlNode.unparse()).
> > > > >> > >
> > > > >> > > Now the tricky part is 3), as you have to set proper SqlNode
> > types
> > > > as
> > > > >> > > children of the SELECT node for column names, table names and
> > for
> > > > >> WHERE
> > > > >> > > predicate. I don't remember them properly, but what you can do
> > is
> > > > >> parse a
> > > > >> > > proper query (again, don't validate it as you'll then need
> table
> > > > >> > metadata),
> > > > >> > > and check the kinds (getKind()) of children SqlNode's and
> > replace
> > > > them
> > > > >> > with
> > > > >> > > your replacements using setters of root SqlSelect node.
> > > > >> > >
> > > > >> > > This should work, but I'm not 100% certain as I'm unable to
> > check
> > > > >> right
> > > > >> > > now.
> > > > >> > >
> > > > >> > > PS: Take care about dialects. Dialects are used in unparsing
> as
> > a
> > > > >> > > "configuration" of SqlWriter, and can unparse the syntax tree
> > > > >> differently
> > > > >> > > based on the dialect you have chosen.
> > > > >> > >
> > > > >> > > Kind regards,
> > > > >> > > Askar Bozcan
> > > > >> > >
> > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> > > > >> soumyamyb95@gmail.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > Hello All,
> > > > >> > >
> > > > >> > > I have just heard of Apache Calcite and was exploring the
> > > > >> possibilities.
> > > > >> > I
> > > > >> > > wish to achieve the following, and wanted to check if my hunch
> > is
> > > > >> > correct:
> > > > >> > > - Use a template to build SQL queries, like use jinja-sql or
> > even
> > > > >> pebble
> > > > >> > > (interpret the SqlNode tree kind of structure from my template
> > and
> > > > >> then
> > > > >> > add
> > > > >> > > the necessary fields like table name and group by fields from
> an
> > > > >> input)
> > > > >> > >
> > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be
> > > inside
> > > > a
> > > > >> > query
> > > > >> > > template (in Jinja-sql it may look like "SELECT
> {{select_fields
> > |
> > > > >> > sqlsafe}}
> > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
> sqlsafe}}"
> > > and
> > > > >> > values
> > > > >> > > like 'select_fields' would be substituted at run-time from a
> > > > >> processing
> > > > >> > > engine like https://pypi.org/project/Jinja2/, but for Java)
> and
> > > the
> > > > >> > output
> > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with
> > > some
> > > > >> > dialect
> > > > >> > > (like Snowflake or Big-Query).
> > > > >> > >
> > > > >> > > Is this possible? Any recommendations or suggestions are
> > welcome.
> > > > >> Even if
> > > > >> > > the approach feels wrong please let me know. :)
> > > > >> > > Thank you for your time and consideration.
> > > > >> > >
> > > > >> > > With regards,
> > > > >> > > Soumyadeep Mukhopadhyay.
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Template SQL

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.

Hello All,

Thank you so much Askar for sharing this with me.

What I gather is Calcite can act as a precursor to the execution layer and
help off load tasks to the underlying db engines if the rules are written
correctly.

I have an approach for implementing what Chris was talking about but I am
not sure if this is correct or not.
1. In the video he talks about "SparkTable" , I am guessing he means to
implement something like *org.apache.calcite.schema.Table *or
*org.apache.calcite.schema.impl.AbstractTable
*as SparkTable (not sure if I should use Wrapper class as well or not, I
was looking at some other implementations like RedisTable or
KafkaStreamTable and they seem to implement *ScannableTable *interface).
(Also not sure if I need to write a rule to convert the convention into
Spark convention).
2. If step 1 is successful then maybe pass this "SparkTable" onto the Spark
engine via the adapter and then convert them to dataframes (spark's
abstraction of table).
3. But I am not sure what happens to the rest of the plan and if it is even
possible to preserve the rules without letting Spark rewrite it.

An alternative that came to my mind was to use Apache Beam :
*Approach 1:* Add planner rules -> find a way to convert a RelNode to
something similar in PTransforms (like via BeamRelNode maybe) -> if
required convert the tables (AbstractTable or otherwise) to PCollections ->
use the runner as Spark such that planner rules are preserved or use Google
DataFlow (if Spark throws out the planner rules written by me).

*Approach 2: *Within a planner's rule add a rule to convert the nodes into
the BeamConvention and then carry out the above.

Although I have these approaches in mind I do not know which one would be
better to pursue (given what I am trying to achieve is join between 2
different tables in 2 different environments like BigQuery and Snowflake
and use the optimization rules that can be implemented with Calcite, but I
do not want to write my own distributed query execution engine). If anyone
has any suggestions or any guidance to provide I am all ears. Apologies for
the long email and thanks a lot for reading it!

With regards,
Soumyadeep.

On Sun, Jun 4, 2023 at 6:38 PM Askar Bozcan <as...@gmail.com> wrote:

> Hey again,
>
> I want to apply certain constraints on what the user can actually
> > query from my database. Let's say I am providing a dataset to a user who
> > belongs to a different organization and I only want to expose a schema
> that
> > I am comfortable with via views. Whenever he or she is going to issue a
> > query I would want that query to be parsed and validated against the set
> of
> > constraints or rules that is important to my organisation but at the
> > same time I would not want to trigger an error just because someone did
> not
> > add a having clause (an example), rather I would try to modify the query
> > itself and then push the final query.
>
>
> Honestly, I'd say it's way easier to just modify the query itself on
> SqlNode level, and continue with the Calcite pipeline normally (with the
> modified query), as your rules inadvertently modify the final logical plan,
> so I'd argue these modifications should be done at the start.
>
> ---
>
> *Regarding transforming SQL queries into a RelBuilder syntax:*
> I guess you could do it by traversing the relational tree (RelNode) and
> building a RelBuilder string backwards from it? Although possible, not
> really sure how feasible it is.
> In either case, whatever noSQL language query you may have, if you can
> convert it to (a valid) relational tree in the end, there should be no
> problems.
>
> ----
>
> And lets say I have a table from Snowflake and one in BigQuery, if I try to
> > join them both using Apache Calcite (the way I envision this would happen
> > is create a view within Calcite first, for each of these tables, and then
> > do the join) where will the computation happen?
>
>
> Chris Baynes has a talk mentioning this, check it out:
> https://www.youtube.com/watch?v=4JAOkLKrcYE
> But in summary, the computation location depends on how the data is pushed
> down to the respective DBs. Using optimization rules (in Calcite) you
> should be able to perform all the needed operations on different databases
> in a federated manner and have Calcite as only a ``facade'', pushing down
> the operations to different DBs.
> If you do not push down the operations (through rules), the computations
> will be done by Calcite in memory, so it's definitely undesirable.
> *Granted, I do not know this part practically, and it's on my 'Calcite
> to-do list', so, I won't be able to help out with any details.*
>
> To give an example, Cassandra has no joins. So, what if you WANT to have
> joins still? What (in theory) you could do with Calcite is to write a query
> that accesses the data from Cassandra, and performs the JOINs by sending
> the data to a Spark cluster from where you can access it.
>
>
> Regards,
> Askar
>
> On Sat, 27 May 2023 at 13:39, Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com>
> wrote:
>
> > Hello Askar,
> >
> > Thank you again for the replies! Hope all is well! :)
> >
> > To answer your questions:
> > Question: Wanting to use Calcite (Calcite-server in particular) as a
> "proxy
> > DB" to
> > change incoming queries to have HAVING before passing the query to an
> > underlying database?
> > Answer: Yes.
> >
> > Question: Just want to use Calcite to parse a SELECT query, and if it
> > doesn't have
> > a HAVING query, modify it and use it again?
> > Answer: Yes.
> >
> > Question: In either case, what's your use case?
> > Answer: I want to apply certain constraints on what the user can actually
> > query from my database. Let's say I am providing a dataset to a user who
> > belongs to a different organization and I only want to expose a schema
> that
> > I am comfortable with via views. Whenever he or she is going to issue a
> > query I would want that query to be parsed and validated against the set
> of
> > constraints or rules that is important to my organisation but at the
> > same time I would not want to trigger an error just because someone did
> not
> > add a having clause (an example), rather I would try to modify the query
> > itself and then push the final query.
> >
> > Question: `Is there a way to generate the RelBuilder version of the same
> > Sql query?
> > (SqlToRelConverter?) What do you mean by that?`
> > Answer: I was hoping there is some API which would produce a version of
> the
> > SQL statement which is similar to how one writes queries via RelBuilder.
> > My thoughts behind this question was there would be some kind of an API
> > which would convert "SELECT c1 FROM tbl1" into something like
> > builder .scan("tbl1") .project(builder.field("c1"))
> > .build(); and vice versa. I don't have an exact use case for this just
> yet
> > but what I was imagining to do is use something similar as a template
> > (let's say as a FreeMarker template) and then chain these templates,
> > probably via CTEs, and then produce their results (as a SQL string). This
> > way the developer only needs to worry about how the query would look in
> > this fashion and not worry much about how to rewrite a no-sql engine
> based
> > query into SQL, but I think there may not be too many use cases for that.
> >
> > I understand and appreciate why "apply a similar approach to a RelRule"
> > might be an anti-pattern, but, for the sake of trying it out, I want to
> see
> > if it is even possible. I know you said you are not too familiar but if
> you
> > have any roadmap regarding how to implement any planner rule, please feel
> > free to chime in!
> >
> > The following has been on my mind for sometime now but I could not find a
> > close enough answer for this, please feel free to point to anyone, or
> not,
> > if you do not wish to answer this. :)
> >
> > And lets say I have a table from Snowflake and one in BigQuery, if I try
> to
> > join them both using Apache Calcite (the way I envision this would happen
> > is create a view within Calcite first, for each of these tables, and then
> > do the join) where will the computation happen? If the answer is somewhat
> > like this, it would happen in memory and wherever your application is
> > running, then I might need to build a framework which has the capability
> to
> > process the join (maybe over millions of rows) and in a distributed
> manner
> > but also leverages the work that has gone into writing the rules in
> Apache
> > Calcite. So should I consider some other engine or platform that supports
> > Apache Calcite like Apache Beam maybe (seems to me like although they use
> > Calcite I may not be able to deploy my own rules in that engine) or
> > Hazelcast? I am not sure but I am open to any and all suggestions!
> >
> > I can not thank you enough for taking the time to hear my concerns, go
> > through my code and provide feedback. I am more than grateful to you! :)
> >
> > With regards,
> > Soumyadeep Mukhopadhyay.
> >
> > On Sat, May 27, 2023 at 12:05 AM Askar Bozcan <as...@gmail.com>
> > wrote:
> >
> > > Hello again! Sorry for the late reply.
> > > Just to be on the same page, I have some questions because I am not
> 100%
> > > what you are trying to accomplish.
> > >
> > > Are you:
> > > a) Wanting to use Calcite (Calcite-server in particular) as a "proxy
> DB"
> > to
> > > change incoming queries to have HAVING before passing the query to an
> > > underlying database?
> > > b) Just want to use Calcite to parse a SELECT query, and if it doesn't
> > have
> > > a HAVING query, modify it and use it again?
> > >
> > > *In either case, what's your use case?*
> > >
> > > Regarding answers to some of your questions:
> > >
> > > Q: Is it not possible to add a node to a tree when the sql query has
> been
> > > parsed to RelNode after validation?
> > > A: You can, you can set the children of RelNode with
> > > RelNode.replaceInput (though
> > > the caveat here, is you're setting a whole sub-tree, so need to be
> > careful
> > > there)
> > >
> > > Q: If I wish to apply a similar approach to a RelRule which I am trying
> > to
> > > enforce, is that not expected? (an anti-pattern)
> > > A: Physical plan building is one part of Calcite I'm not too familiar
> > with,
> > > however I'd argue that it's an antipattern, as those are used for
> > physical
> > > planning. Physical plans are NOT meant to change the overall relation,
> > and
> > > are for optimizing the query (adapting the query to run efficiently &
> > > correctly on different databases' different architectures)
> > >
> > > Q: Is there a way to generate the RelBuilder version of the same Sql
> > query?
> > > (SqlToRelConverter?)
> > > A: What do you mean by that? RelBuilder is meant for programmatically
> > > building relational expressions (RelNode). This is just an alternative
> to
> > > SQL parsing.
> > >
> > > Regards,
> > > Askar
> > >
> > > On Thu, 25 May 2023 at 20:14, Soumyadeep Mukhopadhyay <
> > > soumyamyb95@gmail.com>
> > > wrote:
> > >
> > > > Hey Askar,
> > > >
> > > > Hope you are doing well. If and when time permits please share your
> > > > feedback. It would be really helpful!
> > > >
> > > > Thanks and Regards,
> > > > Soumyadeep.
> > > >
> > > > On Wed, May 10, 2023 at 2:17 AM Soumyadeep Mukhopadhyay <
> > > > soumyamyb95@gmail.com> wrote:
> > > >
> > > >> Hey Askar,
> > > >>
> > > >> I am happy to inform that I could make a little more progress, and
> > here
> > > >> is the gist -
> > > >>
> > >
> >
> https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67
> > > >> .
> > > >>
> > > >> Although I am able to do this, I don't think I have reached my goal
> > yet.
> > > >> A question that came to mind was this :
> > > >> - Is it not possible to add a node to a tree when the sql query has
> > been
> > > >> parsed to RelNode after validation?
> > > >> - If I wish to apply a similar approach to a RelRule which I am
> trying
> > > to
> > > >> enforce, is that not expected? (an anti-pattern)
> > > >> - Is there a way to generate the RelBuilder version of the same Sql
> > > >> query? (SqlToRelConverter?)
> > > >>
> > > >> My approach in this gist :
> > > >> get a SqlNode (parsed query) ==> visit each SqlCall node recursively
> > ==>
> > > >> find SqlSelect (since there's getHaving and setHaving) ==> create a
> > new
> > > >> dummy query and extract having from the query ==>
> > > >> set Having for the current query at hand via the dummy query.
> > > >> Note : For no particular reason I have used only visit by SqlCall,
> was
> > > >> trying to build up on what you shared and the other visit methods
> > didn't
> > > >> seem to fit.
> > > >>
> > > >> The approach I have in mind for a RelRule :
> > > >> use ConverterRule ==> find the node under onMatch as Aggregate ==>
> > > >> then use call.transformTo to add a having clause to the existing
> > > >> aggregate node if there's no having clause present using the
> > RelBuilder,
> > > >> but that does not seem feasible.
> > > >> (below is what I was thinking, original query : SELECT o_custkey,
> > > COUNT(*)
> > > >> AS C FROM orders GROUP BY o_custkey)
> > > >> [image: Screenshot 2023-05-10 at 2.09.41 AM.png]
> > > >>
> > > >> Maybe I am missing something. Please feel free to share your
> comments.
> > > :)
> > > >> Thank you again!
> > > >>
> > > >> With regards,
> > > >> Soumyadeep Mukhopadhyay.
> > > >>
> > > >> On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
> > > >> soumyamyb95@gmail.com> wrote:
> > > >>
> > > >>> Hey Askar,
> > > >>>
> > > >>> Thank you so much for your email. Please take as much time as you
> > need.
> > > >>> I will keep trying in the meantime. :)
> > > >>>
> > > >>> With regards,
> > > >>> Soumyadeep Mukhopadhyay.
> > > >>>
> > > >>> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <askar.mulin@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>>> Hey, sorry for the late reply but I had some stuff to do IRL &
> some
> > > >>>> deadlines I need to meet right now. I'll try to answer you as soon
> > as
> > > I
> > > >>>> can, but this week doesn't seem possible.
> > > >>>>
> > > >>>> Keep the questions coming, however, if you haven't solved them
> > > yourself
> > > >>>> (and if you did, please share your solutions). I'm planning to
> > write a
> > > >>>> sort-of "Comprehensive Calcite 101" documentation in the near
> future
> > > >>>> which
> > > >>>> I am going to be basing on my own experiences & questions in the
> > > >>>> community.
> > > >>>>
> > > >>>> Kind regards,
> > > >>>> Askar
> > > >>>>
> > > >>>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
> > > >>>> soumyamyb95@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>> > Hello Askar,
> > > >>>> >
> > > >>>> > Thank you so much for guiding me, I have taken some inspiration
> > from
> > > >>>> your
> > > >>>> > code and from Dremio's implementation and compiled this -
> > > >>>> >
> > > >>>>
> > >
> >
> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
> > > >>>> > .
> > > >>>> >
> > > >>>> > I have a few follow-up questions, if I may, regarding what I
> have
> > > >>>> > implemented:
> > > >>>> >
> > > >>>> > - is it possible to implement this as a rule, probably a RelRule
> > (I
> > > am
> > > >>>> > saying a RelRule because as far as I have understood RelRule is
> > the
> > > >>>> way you
> > > >>>> > implement rules that you would like to enforce before the
> physical
> > > >>>> plan is
> > > >>>> > created)?
> > > >>>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120
> > is
> > > >>>> > present, if not then add it in the query (probably transform
> it).*
> > > >>>> > *research done so far* : I probably need to create a RelRule and
> > > >>>> implement
> > > >>>> > the "matches" and "onMatch" methods. The "onMatch" method should
> > > >>>> transform
> > > >>>> > my rel or rex node using transformTo, probably.
> > > >>>> > Something like
> > > >>>> >
> > > >>>> >
> > > >>>>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
> > > >>>> > is
> > > >>>> > something which I might need to implement.
> > > >>>> > But a few methods are hard to grasp for me and they are:
> > > >>>> >
> > > >>>> >  - transformTo
> > > >>>> >  - RelRule.Config.withOperandFor()
> > > >>>> >
> > > >>>> > Could you please give me some insight on how I can understand
> > their
> > > >>>> > structures?
> > > >>>> >
> > > >>>> > - is it possible to find whether a node (sql/rel) is GROUP BY or
> > > not?
> > > >>>> > - if there is an AND clause in WHERE, is there a way to parse
> all
> > > the
> > > >>>> > filter conditions separately? (using RexCall.getOperator did not
> > > work)
> > > >>>> > - is this (what I have done) only a hack or is there a better
> way
> > to
> > > >>>> > implement this?
> > > >>>> >
> > > >>>> > I understand that my questions are probably not trivial, so I
> > > >>>> appreciate
> > > >>>> > that you are reading my emails and providing your valuable
> > feedback.
> > > >>>> >
> > > >>>> > Thank you for your time and consideration. :)
> > > >>>> >
> > > >>>> > With regards,
> > > >>>> > Soumyadeep Mukhopadhyay.
> > > >>>> >
> > > >>>> >
> > > >>>> >
> > > >>>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <
> > > askar.mulin@gmail.com>
> > > >>>> > wrote:
> > > >>>> >
> > > >>>> > > Thank you for your kind words :)
> > > >>>> > >
> > > >>>> > > - Askar
> > > >>>> > >
> > > >>>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
> > > >>>> > soumyamyb95@gmail.com
> > > >>>> > > >
> > > >>>> > > wrote:
> > > >>>> > >
> > > >>>> > > > Hello Askar,
> > > >>>> > > >
> > > >>>> > > > Thank you so much for taking the time to compile this for
> me.
> > > >>>> > > >
> > > >>>> > > > I shall be indebted to you for this effort. Please let me
> know
> > > if
> > > >>>> I can
> > > >>>> > > be
> > > >>>> > > > of any assistance ever to repay your kindness.
> > > >>>> > > >
> > > >>>> > > > I appreciate all your help and guidance. Thank you once
> > again. I
> > > >>>> shall
> > > >>>> > go
> > > >>>> > > > through the same and will get back to you as soon as
> possible.
> > > >>>> Thank
> > > >>>> > you
> > > >>>> > > > again sir! :)
> > > >>>> > > >
> > > >>>> > > > With best regards,
> > > >>>> > > > Soumyadeep Mukhopadhyay.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
> > > >>>> askar.mulin@gmail.com>
> > > >>>> > > > wrote:
> > > >>>> > > >
> > > >>>> > > > > Hello again!
> > > >>>> > > > > I have prepared this gist to perhaps help you understand
> how
> > > to
> > > >>>> > > traverse
> > > >>>> > > > > through a Sql parse tree:
> > > >>>> > > > >
> > > >>>>
> > https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> > > >>>> > > > >
> > > >>>> > > > > I have implemented a Filterer class that returns false on
> > any
> > > >>>> queries
> > > >>>> > > > which
> > > >>>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for
> all
> > > >>>> other
> > > >>>> > > > queries.
> > > >>>> > > > > I have used Kotlin, so if you have difficulties
> > understanding
> > > >>>> it (as
> > > >>>> > > > you're
> > > >>>> > > > > using Scala), check out Kotlin's smart cast and
> nullability.
> > > >>>> > > > >
> > > >>>> > > > > Good luck and hopefully that was helpful,
> > > >>>> > > > > Askar Bozcan
> > > >>>> > > > >
> > > >>>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> > > >>>> > > > > soumyamyb95@gmail.com>
> > > >>>> > > > > wrote:
> > > >>>> > > > >
> > > >>>> > > > > > Hi Askar,
> > > >>>> > > > > >
> > > >>>> > > > > > I have gone through your email a couple of times. I am
> > > trying
> > > >>>> to
> > > >>>> > > > > > understand it bit by bit.
> > > >>>> > > > > >
> > > >>>> > > > > > I have a request, please feel free to say no, is there
> any
> > > >>>> code
> > > >>>> > base
> > > >>>> > > > > where
> > > >>>> > > > > > I can refer and understand how to implement my own
> filter
> > > >>>> rule?
> > > >>>> > > > > > What I realised I asked wrong is I wanted to implement
> > > >>>> "(HAVING
> > > >>>> > > > COUNT(*)
> > > >>>> > > > > > > 100)" as a global rule, if there are any aggregation
> > > >>>> queries.
> > > >>>> > > > > >
> > > >>>> > > > > > It is a bit difficult for me to understand where I
> should
> > > >>>> start.
> > > >>>> > > > > > For example, if I follow this
> > > >>>> > > > > https://github.com/zabetak/calcite-tutorial where
> > > >>>> > > > > > shall I start looking at?
> > > >>>> > > > > > In the below image I feel like some kind of comparison
> is
> > > >>>> going on,
> > > >>>> > > > > should
> > > >>>> > > > > > I devise my code like this? (in order to implement
> > something
> > > >>>> like
> > > >>>> > > > > "(HAVING
> > > >>>> > > > > > COUNT(*) > 100)")
> > > >>>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> > > >>>> > > > > >
> > > >>>> > > > > > Also is there any documentation I can go through
> regarding
> > > >>>> how I
> > > >>>> > can
> > > >>>> > > > > > traverse through the AST?
> > > >>>> > > > > > Probably I am not looking in the right places but so
> far I
> > > >>>> could
> > > >>>> > only
> > > >>>> > > > go
> > > >>>> > > > > > through Tab9 code examples (or the documentation
> provided
> > by
> > > >>>> the
> > > >>>> > > > Calcite
> > > >>>> > > > > > website) and things did not seem to be clear.
> > > >>>> > > > > > I realise it is probably a lot to ask, so whatever you
> > share
> > > >>>> will
> > > >>>> > be
> > > >>>> > > a
> > > >>>> > > > > lot
> > > >>>> > > > > > of help for me.
> > > >>>> > > > > >
> > > >>>> > > > > > Thanks again for your time, patience and help!
> > > >>>> > > > > >
> > > >>>> > > > > > With regards,
> > > >>>> > > > > > Soumyadeep Mukhopadhyay.
> > > >>>> > > > > >
> > > >>>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
> > > >>>> askar.mulin@gmail.com
> > > >>>> > >
> > > >>>> > > > > wrote:
> > > >>>> > > > > >
> > > >>>> > > > > >> Hey,
> > > >>>> > > > > >> You can use Planner.reset(). Note that it has mutable
> > state
> > > >>>> > inside,
> > > >>>> > > so
> > > >>>> > > > > do
> > > >>>> > > > > >> not reuse the same planner instance in any
> > multi-threading
> > > >>>> > > > environment.
> > > >>>> > > > > >>
> > > >>>> > > > > >> (I am assuming you have access to table metadata so
> that
> > > you
> > > >>>> will
> > > >>>> > be
> > > >>>> > > > > able
> > > >>>> > > > > >> to convert your SqlNode tree into RelNode tree,
> > relational
> > > >>>> > > expression
> > > >>>> > > > > >> tree)
> > > >>>> > > > > >> - Only return results above a certain threshold when
> > using
> > > >>>> GROUP
> > > >>>> > BY,
> > > >>>> > > > for
> > > >>>> > > > > >> example (HAVING COUNT(col1) > 100).
> > > >>>> > > > > >>
> > > >>>> > > > > >> I'm not quite sure I understand this question is HAVING
> > > part
> > > >>>> of
> > > >>>> > the
> > > >>>> > > > > query?
> > > >>>> > > > > >>
> > > >>>> > > > > >> - Restrict the column on which joins can happen, or
> else
> > > >>>> throw an
> > > >>>> > > > error
> > > >>>> > > > > >> (almost like analysisException in Spark)
> > > >>>> > > > > >>
> > > >>>> > > > > >> Do you have access to table/schema metadata?
> > > >>>> > > > > >> If you do:
> > > >>>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a
> > logical
> > > >>>> > > relational
> > > >>>> > > > > >> tree
> > > >>>> > > > > >> (RelNode).
> > > >>>> > > > > >> Watch this tutorial by Stamatis:
> > > >>>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> > > >>>> > > > > >> It will explain way better the usage than I can in an
> > > email.
> > > >>>> (And
> > > >>>> > it
> > > >>>> > > > > what
> > > >>>> > > > > >> introduced me to Calcite's basics :))
> > > >>>> > > > > >> 2) Traverse your relational tree by implementing
> > RelShuttle
> > > >>>> > > > > >> <
> > > >>>> > > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> > > >>>> > > > > >> >.
> > > >>>> > > > > >> (Look up Visitor pattern to better understand how this
> > > >>>> recursive
> > > >>>> > > > > traversal
> > > >>>> > > > > >> works)
> > > >>>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> > > >>>> > > > > >> 4) When you encounter LogicalJoin, traverse its
> children
> > > with
> > > >>>> > > getLeft
> > > >>>> > > > > and
> > > >>>> > > > > >> getRight
> > > >>>> > > > > >> 5) When you encounter LogicalTableScan,  You can get
> its
> > > >>>> > RelOptTable
> > > >>>> > > > > with
> > > >>>> > > > > >> getTable, and then RelOptTable.getRowType() to find the
> > > >>>> fields.
> > > >>>> > > > > >> (Not 100% about step 5, but should be close to it)
> > > >>>> > > > > >>
> > > >>>> > > > > >>
> > > >>>> > > > > >> - Restrict the columns that can be passed in a select
> > > >>>> statement or
> > > >>>> > > > else
> > > >>>> > > > > >> throw an error (like in the point above)
> > > >>>> > > > > >> Same logic as above. But instead of TableScan, look at
> > the
> > > >>>> root
> > > >>>> > > > > >> LogicalProject.
> > > >>>> > > > > >>
> > > >>>> > > > > >> All of the above should be doable with just the parse
> > tree
> > > >>>> > (SqlNode)
> > > >>>> > > > and
> > > >>>> > > > > >> without access to table metadata using SqlShuttle;
> > however,
> > > >>>> it's
> > > >>>> > > > easier
> > > >>>> > > > > >> and
> > > >>>> > > > > >> less error-prone with relational tree IMO.
> > > >>>> > > > > >>
> > > >>>> > > > > >> - Askar
> > > >>>> > > > > >>
> > > >>>> > > > > >>
> > > >>>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> > > >>>> > > > > >> soumyamyb95@gmail.com>
> > > >>>> > > > > >> wrote:
> > > >>>> > > > > >>
> > > >>>> > > > > >> > Hey Askar,
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > It worked exactly how you suggested.
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > ```
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > private val sqlQuery: String = "SELECT
> list_of_columns
> > > FROM
> > > >>>> > > > table_name
> > > >>>> > > > > >> > WHERE predicate_column = 'predicate_value'"
> > > >>>> > > > > >> > private val frameworkConfig: FrameworkConfig =
> > > >>>> > > > > >> > Frameworks.newConfigBuilder.build()
> > > >>>> > > > > >> > private val planner: Planner =
> > > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > > >>>> > > > > >> > private val planner2: Planner =
> > > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > > >>>> > > > > >> > private val planner3: Planner =
> > > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > > >>>> > > > > >> > private val sqlNode: SqlNode =
> planner.parse(sqlQuery)
> > > >>>> > > > > >> > println(sqlNode.getKind)
> > > >>>> > > > > >> > private val sqlSelectStmt: SqlSelect =
> > > >>>> > > > sqlNode.asInstanceOf[SqlSelect]
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
> > > >>>> > > > > >> > private val selectList =
> > > >>>> > > > > >> >
> > > >>>> > > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > > >>>> > > > > >> > private val setFromTableQuery = "SELECT employee"
> > > >>>> > > > > >> > private val fromTable =
> > > >>>> > > > > >> >
> > > >>>> > > > >
> > > >>>> >
> > > >>>>
> > > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
> > > >>>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
> > > >>>> > > > > >> > private val finalQuery =
> > > >>>> sqlSelectStmt.asInstanceOf[SqlNode]
> > > >>>> > > > > >> >
> > > >>>> println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > ```
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > The only caveat I see is the need for a new planner
> for
> > > >>>> every
> > > >>>> > new
> > > >>>> > > > > query.
> > > >>>> > > > > >> > Should I do something else or is this expected?
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > On a different note, I wanted to ask about how I can
> > > write
> > > >>>> my
> > > >>>> > own
> > > >>>> > > > > >> rules. Is
> > > >>>> > > > > >> > it possible to enforce following rules:
> > > >>>> > > > > >> > - Only return results above a certain threshold when
> > > using
> > > >>>> GROUP
> > > >>>> > > BY,
> > > >>>> > > > > for
> > > >>>> > > > > >> > example (HAVING COUNT(col1) > 100).
> > > >>>> > > > > >> > - Restrict the column on which joins can happen, or
> > else
> > > >>>> throw
> > > >>>> > an
> > > >>>> > > > > error
> > > >>>> > > > > >> > (almost like analysisException in Spark)
> > > >>>> > > > > >> > - Restrict the columns that can be passed in a select
> > > >>>> statement
> > > >>>> > or
> > > >>>> > > > > else
> > > >>>> > > > > >> > throw an error (like in the point above)
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > Not sure if it is feasible, please feel free to
> > suggest.
> > > :)
> > > >>>> > Thanks
> > > >>>> > > > > again
> > > >>>> > > > > >> > for your time!
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > With regards,
> > > >>>> > > > > >> > Soumyadeep Mukhopadhyay.
> > > >>>> > > > > >> >
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
> > > >>>> > askar.mulin@gmail.com
> > > >>>> > > >
> > > >>>> > > > > >> wrote:
> > > >>>> > > > > >> >
> > > >>>> > > > > >> > > Hey Soumyadeep,
> > > >>>> > > > > >> > > I think that can work with a few caveats.
> > > >>>> > > > > >> > > 0) Use the Planner from Frameworks
> > > >>>> > > > > >> > > <
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> >
> > > >>>> > > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > > >>>> > > > > >> > > >
> > > >>>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a
> > > root
> > > >>>> > > SqlNode)
> > > >>>> > > > > but
> > > >>>> > > > > >> do
> > > >>>> > > > > >> > > not validate it
> > > >>>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
> > > >>>> > > > > >> > > <
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> >
> > > >>>> > > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > > >>>> > > > > >> > > >
> > > >>>> > > > > >> > > (can
> > > >>>> > > > > >> > > verify that it's a SELECT by using getKind())
> > > >>>> > > > > >> > > 3) Use the setters from SqlSelect to modify the
> > > children
> > > >>>> of
> > > >>>> > root
> > > >>>> > > > > >> > SqlSelect
> > > >>>> > > > > >> > > node.
> > > >>>> > > > > >> > > 4) Unparse the root query back into a string
> > > >>>> > > (SqlNode.unparse()).
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > Now the tricky part is 3), as you have to set
> proper
> > > >>>> SqlNode
> > > >>>> > > types
> > > >>>> > > > > as
> > > >>>> > > > > >> > > children of the SELECT node for column names, table
> > > >>>> names and
> > > >>>> > > for
> > > >>>> > > > > >> WHERE
> > > >>>> > > > > >> > > predicate. I don't remember them properly, but what
> > you
> > > >>>> can do
> > > >>>> > > is
> > > >>>> > > > > >> parse a
> > > >>>> > > > > >> > > proper query (again, don't validate it as you'll
> then
> > > >>>> need
> > > >>>> > table
> > > >>>> > > > > >> > metadata),
> > > >>>> > > > > >> > > and check the kinds (getKind()) of children
> SqlNode's
> > > and
> > > >>>> > > replace
> > > >>>> > > > > them
> > > >>>> > > > > >> > with
> > > >>>> > > > > >> > > your replacements using setters of root SqlSelect
> > node.
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > This should work, but I'm not 100% certain as I'm
> > > unable
> > > >>>> to
> > > >>>> > > check
> > > >>>> > > > > >> right
> > > >>>> > > > > >> > > now.
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > PS: Take care about dialects. Dialects are used in
> > > >>>> unparsing
> > > >>>> > as
> > > >>>> > > a
> > > >>>> > > > > >> > > "configuration" of SqlWriter, and can unparse the
> > > syntax
> > > >>>> tree
> > > >>>> > > > > >> differently
> > > >>>> > > > > >> > > based on the dialect you have chosen.
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > Kind regards,
> > > >>>> > > > > >> > > Askar Bozcan
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> > > >>>> > > > > >> soumyamyb95@gmail.com>
> > > >>>> > > > > >> > > wrote:
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > Hello All,
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > I have just heard of Apache Calcite and was
> exploring
> > > the
> > > >>>> > > > > >> possibilities.
> > > >>>> > > > > >> > I
> > > >>>> > > > > >> > > wish to achieve the following, and wanted to check
> if
> > > my
> > > >>>> hunch
> > > >>>> > > is
> > > >>>> > > > > >> > correct:
> > > >>>> > > > > >> > > - Use a template to build SQL queries, like use
> > > >>>> jinja-sql or
> > > >>>> > > even
> > > >>>> > > > > >> pebble
> > > >>>> > > > > >> > > (interpret the SqlNode tree kind of structure from
> my
> > > >>>> template
> > > >>>> > > and
> > > >>>> > > > > >> then
> > > >>>> > > > > >> > add
> > > >>>> > > > > >> > > the necessary fields like table name and group by
> > > fields
> > > >>>> from
> > > >>>> > an
> > > >>>> > > > > >> input)
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?"
> > > >>>> would be
> > > >>>> > > > inside
> > > >>>> > > > > a
> > > >>>> > > > > >> > query
> > > >>>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
> > > >>>> > {{select_fields
> > > >>>> > > |
> > > >>>> > > > > >> > sqlsafe}}
> > > >>>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause
> |
> > > >>>> > sqlsafe}}"
> > > >>>> > > > and
> > > >>>> > > > > >> > values
> > > >>>> > > > > >> > > like 'select_fields' would be substituted at
> run-time
> > > >>>> from a
> > > >>>> > > > > >> processing
> > > >>>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but
> > for
> > > >>>> Java)
> > > >>>> > and
> > > >>>> > > > the
> > > >>>> > > > > >> > output
> > > >>>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT
> > > >>>> NULL" with
> > > >>>> > > > some
> > > >>>> > > > > >> > dialect
> > > >>>> > > > > >> > > (like Snowflake or Big-Query).
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > Is this possible? Any recommendations or
> suggestions
> > > are
> > > >>>> > > welcome.
> > > >>>> > > > > >> Even if
> > > >>>> > > > > >> > > the approach feels wrong please let me know. :)
> > > >>>> > > > > >> > > Thank you for your time and consideration.
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> > > With regards,
> > > >>>> > > > > >> > > Soumyadeep Mukhopadhyay.
> > > >>>> > > > > >> > >
> > > >>>> > > > > >> >
> > > >>>> > > > > >>
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Re: Template SQL

Posted by Askar Bozcan <as...@gmail.com>.

Hey again,

I want to apply certain constraints on what the user can actually
> query from my database. Let's say I am providing a dataset to a user who
> belongs to a different organization and I only want to expose a schema that
> I am comfortable with via views. Whenever he or she is going to issue a
> query I would want that query to be parsed and validated against the set of
> constraints or rules that is important to my organisation but at the
> same time I would not want to trigger an error just because someone did not
> add a having clause (an example), rather I would try to modify the query
> itself and then push the final query.


Honestly, I'd say it's way easier to just modify the query itself on
SqlNode level, and continue with the Calcite pipeline normally (with the
modified query), as your rules inadvertently modify the final logical plan,
so I'd argue these modifications should be done at the start.

---

*Regarding transforming SQL queries into a RelBuilder syntax:*
I guess you could do it by traversing the relational tree (RelNode) and
building a RelBuilder string backwards from it? Although possible, not
really sure how feasible it is.
In either case, whatever noSQL language query you may have, if you can
convert it to (a valid) relational tree in the end, there should be no
problems.

----

And lets say I have a table from Snowflake and one in BigQuery, if I try to
> join them both using Apache Calcite (the way I envision this would happen
> is create a view within Calcite first, for each of these tables, and then
> do the join) where will the computation happen?


Chris Baynes has a talk mentioning this, check it out:
https://www.youtube.com/watch?v=4JAOkLKrcYE
But in summary, the computation location depends on how the data is pushed
down to the respective DBs. Using optimization rules (in Calcite) you
should be able to perform all the needed operations on different databases
in a federated manner and have Calcite as only a ``facade'', pushing down
the operations to different DBs.
If you do not push down the operations (through rules), the computations
will be done by Calcite in memory, so it's definitely undesirable.
*Granted, I do not know this part practically, and it's on my 'Calcite
to-do list', so, I won't be able to help out with any details.*

To give an example, Cassandra has no joins. So, what if you WANT to have
joins still? What (in theory) you could do with Calcite is to write a query
that accesses the data from Cassandra, and performs the JOINs by sending
the data to a Spark cluster from where you can access it.


Regards,
Askar

On Sat, 27 May 2023 at 13:39, Soumyadeep Mukhopadhyay <so...@gmail.com>
wrote:

> Hello Askar,
>
> Thank you again for the replies! Hope all is well! :)
>
> To answer your questions:
> Question: Wanting to use Calcite (Calcite-server in particular) as a "proxy
> DB" to
> change incoming queries to have HAVING before passing the query to an
> underlying database?
> Answer: Yes.
>
> Question: Just want to use Calcite to parse a SELECT query, and if it
> doesn't have
> a HAVING query, modify it and use it again?
> Answer: Yes.
>
> Question: In either case, what's your use case?
> Answer: I want to apply certain constraints on what the user can actually
> query from my database. Let's say I am providing a dataset to a user who
> belongs to a different organization and I only want to expose a schema that
> I am comfortable with via views. Whenever he or she is going to issue a
> query I would want that query to be parsed and validated against the set of
> constraints or rules that is important to my organisation but at the
> same time I would not want to trigger an error just because someone did not
> add a having clause (an example), rather I would try to modify the query
> itself and then push the final query.
>
> Question: `Is there a way to generate the RelBuilder version of the same
> Sql query?
> (SqlToRelConverter?) What do you mean by that?`
> Answer: I was hoping there is some API which would produce a version of the
> SQL statement which is similar to how one writes queries via RelBuilder.
> My thoughts behind this question was there would be some kind of an API
> which would convert "SELECT c1 FROM tbl1" into something like
> builder .scan("tbl1") .project(builder.field("c1"))
> .build(); and vice versa. I don't have an exact use case for this just yet
> but what I was imagining to do is use something similar as a template
> (let's say as a FreeMarker template) and then chain these templates,
> probably via CTEs, and then produce their results (as a SQL string). This
> way the developer only needs to worry about how the query would look in
> this fashion and not worry much about how to rewrite a no-sql engine based
> query into SQL, but I think there may not be too many use cases for that.
>
> I understand and appreciate why "apply a similar approach to a RelRule"
> might be an anti-pattern, but, for the sake of trying it out, I want to see
> if it is even possible. I know you said you are not too familiar but if you
> have any roadmap regarding how to implement any planner rule, please feel
> free to chime in!
>
> The following has been on my mind for sometime now but I could not find a
> close enough answer for this, please feel free to point to anyone, or not,
> if you do not wish to answer this. :)
>
> And lets say I have a table from Snowflake and one in BigQuery, if I try to
> join them both using Apache Calcite (the way I envision this would happen
> is create a view within Calcite first, for each of these tables, and then
> do the join) where will the computation happen? If the answer is somewhat
> like this, it would happen in memory and wherever your application is
> running, then I might need to build a framework which has the capability to
> process the join (maybe over millions of rows) and in a distributed manner
> but also leverages the work that has gone into writing the rules in Apache
> Calcite. So should I consider some other engine or platform that supports
> Apache Calcite like Apache Beam maybe (seems to me like although they use
> Calcite I may not be able to deploy my own rules in that engine) or
> Hazelcast? I am not sure but I am open to any and all suggestions!
>
> I can not thank you enough for taking the time to hear my concerns, go
> through my code and provide feedback. I am more than grateful to you! :)
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
> On Sat, May 27, 2023 at 12:05 AM Askar Bozcan <as...@gmail.com>
> wrote:
>
> > Hello again! Sorry for the late reply.
> > Just to be on the same page, I have some questions because I am not 100%
> > what you are trying to accomplish.
> >
> > Are you:
> > a) Wanting to use Calcite (Calcite-server in particular) as a "proxy DB"
> to
> > change incoming queries to have HAVING before passing the query to an
> > underlying database?
> > b) Just want to use Calcite to parse a SELECT query, and if it doesn't
> have
> > a HAVING query, modify it and use it again?
> >
> > *In either case, what's your use case?*
> >
> > Regarding answers to some of your questions:
> >
> > Q: Is it not possible to add a node to a tree when the sql query has been
> > parsed to RelNode after validation?
> > A: You can, you can set the children of RelNode with
> > RelNode.replaceInput (though
> > the caveat here, is you're setting a whole sub-tree, so need to be
> careful
> > there)
> >
> > Q: If I wish to apply a similar approach to a RelRule which I am trying
> to
> > enforce, is that not expected? (an anti-pattern)
> > A: Physical plan building is one part of Calcite I'm not too familiar
> with,
> > however I'd argue that it's an antipattern, as those are used for
> physical
> > planning. Physical plans are NOT meant to change the overall relation,
> and
> > are for optimizing the query (adapting the query to run efficiently &
> > correctly on different databases' different architectures)
> >
> > Q: Is there a way to generate the RelBuilder version of the same Sql
> query?
> > (SqlToRelConverter?)
> > A: What do you mean by that? RelBuilder is meant for programmatically
> > building relational expressions (RelNode). This is just an alternative to
> > SQL parsing.
> >
> > Regards,
> > Askar
> >
> > On Thu, 25 May 2023 at 20:14, Soumyadeep Mukhopadhyay <
> > soumyamyb95@gmail.com>
> > wrote:
> >
> > > Hey Askar,
> > >
> > > Hope you are doing well. If and when time permits please share your
> > > feedback. It would be really helpful!
> > >
> > > Thanks and Regards,
> > > Soumyadeep.
> > >
> > > On Wed, May 10, 2023 at 2:17 AM Soumyadeep Mukhopadhyay <
> > > soumyamyb95@gmail.com> wrote:
> > >
> > >> Hey Askar,
> > >>
> > >> I am happy to inform that I could make a little more progress, and
> here
> > >> is the gist -
> > >>
> >
> https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67
> > >> .
> > >>
> > >> Although I am able to do this, I don't think I have reached my goal
> yet.
> > >> A question that came to mind was this :
> > >> - Is it not possible to add a node to a tree when the sql query has
> been
> > >> parsed to RelNode after validation?
> > >> - If I wish to apply a similar approach to a RelRule which I am trying
> > to
> > >> enforce, is that not expected? (an anti-pattern)
> > >> - Is there a way to generate the RelBuilder version of the same Sql
> > >> query? (SqlToRelConverter?)
> > >>
> > >> My approach in this gist :
> > >> get a SqlNode (parsed query) ==> visit each SqlCall node recursively
> ==>
> > >> find SqlSelect (since there's getHaving and setHaving) ==> create a
> new
> > >> dummy query and extract having from the query ==>
> > >> set Having for the current query at hand via the dummy query.
> > >> Note : For no particular reason I have used only visit by SqlCall, was
> > >> trying to build up on what you shared and the other visit methods
> didn't
> > >> seem to fit.
> > >>
> > >> The approach I have in mind for a RelRule :
> > >> use ConverterRule ==> find the node under onMatch as Aggregate ==>
> > >> then use call.transformTo to add a having clause to the existing
> > >> aggregate node if there's no having clause present using the
> RelBuilder,
> > >> but that does not seem feasible.
> > >> (below is what I was thinking, original query : SELECT o_custkey,
> > COUNT(*)
> > >> AS C FROM orders GROUP BY o_custkey)
> > >> [image: Screenshot 2023-05-10 at 2.09.41 AM.png]
> > >>
> > >> Maybe I am missing something. Please feel free to share your comments.
> > :)
> > >> Thank you again!
> > >>
> > >> With regards,
> > >> Soumyadeep Mukhopadhyay.
> > >>
> > >> On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
> > >> soumyamyb95@gmail.com> wrote:
> > >>
> > >>> Hey Askar,
> > >>>
> > >>> Thank you so much for your email. Please take as much time as you
> need.
> > >>> I will keep trying in the meantime. :)
> > >>>
> > >>> With regards,
> > >>> Soumyadeep Mukhopadhyay.
> > >>>
> > >>> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hey, sorry for the late reply but I had some stuff to do IRL & some
> > >>>> deadlines I need to meet right now. I'll try to answer you as soon
> as
> > I
> > >>>> can, but this week doesn't seem possible.
> > >>>>
> > >>>> Keep the questions coming, however, if you haven't solved them
> > yourself
> > >>>> (and if you did, please share your solutions). I'm planning to
> write a
> > >>>> sort-of "Comprehensive Calcite 101" documentation in the near future
> > >>>> which
> > >>>> I am going to be basing on my own experiences & questions in the
> > >>>> community.
> > >>>>
> > >>>> Kind regards,
> > >>>> Askar
> > >>>>
> > >>>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
> > >>>> soumyamyb95@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>> > Hello Askar,
> > >>>> >
> > >>>> > Thank you so much for guiding me, I have taken some inspiration
> from
> > >>>> your
> > >>>> > code and from Dremio's implementation and compiled this -
> > >>>> >
> > >>>>
> >
> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
> > >>>> > .
> > >>>> >
> > >>>> > I have a few follow-up questions, if I may, regarding what I have
> > >>>> > implemented:
> > >>>> >
> > >>>> > - is it possible to implement this as a rule, probably a RelRule
> (I
> > am
> > >>>> > saying a RelRule because as far as I have understood RelRule is
> the
> > >>>> way you
> > >>>> > implement rules that you would like to enforce before the physical
> > >>>> plan is
> > >>>> > created)?
> > >>>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120
> is
> > >>>> > present, if not then add it in the query (probably transform it).*
> > >>>> > *research done so far* : I probably need to create a RelRule and
> > >>>> implement
> > >>>> > the "matches" and "onMatch" methods. The "onMatch" method should
> > >>>> transform
> > >>>> > my rel or rex node using transformTo, probably.
> > >>>> > Something like
> > >>>> >
> > >>>> >
> > >>>>
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
> > >>>> > is
> > >>>> > something which I might need to implement.
> > >>>> > But a few methods are hard to grasp for me and they are:
> > >>>> >
> > >>>> >  - transformTo
> > >>>> >  - RelRule.Config.withOperandFor()
> > >>>> >
> > >>>> > Could you please give me some insight on how I can understand
> their
> > >>>> > structures?
> > >>>> >
> > >>>> > - is it possible to find whether a node (sql/rel) is GROUP BY or
> > not?
> > >>>> > - if there is an AND clause in WHERE, is there a way to parse all
> > the
> > >>>> > filter conditions separately? (using RexCall.getOperator did not
> > work)
> > >>>> > - is this (what I have done) only a hack or is there a better way
> to
> > >>>> > implement this?
> > >>>> >
> > >>>> > I understand that my questions are probably not trivial, so I
> > >>>> appreciate
> > >>>> > that you are reading my emails and providing your valuable
> feedback.
> > >>>> >
> > >>>> > Thank you for your time and consideration. :)
> > >>>> >
> > >>>> > With regards,
> > >>>> > Soumyadeep Mukhopadhyay.
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <
> > askar.mulin@gmail.com>
> > >>>> > wrote:
> > >>>> >
> > >>>> > > Thank you for your kind words :)
> > >>>> > >
> > >>>> > > - Askar
> > >>>> > >
> > >>>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
> > >>>> > soumyamyb95@gmail.com
> > >>>> > > >
> > >>>> > > wrote:
> > >>>> > >
> > >>>> > > > Hello Askar,
> > >>>> > > >
> > >>>> > > > Thank you so much for taking the time to compile this for me.
> > >>>> > > >
> > >>>> > > > I shall be indebted to you for this effort. Please let me know
> > if
> > >>>> I can
> > >>>> > > be
> > >>>> > > > of any assistance ever to repay your kindness.
> > >>>> > > >
> > >>>> > > > I appreciate all your help and guidance. Thank you once
> again. I
> > >>>> shall
> > >>>> > go
> > >>>> > > > through the same and will get back to you as soon as possible.
> > >>>> Thank
> > >>>> > you
> > >>>> > > > again sir! :)
> > >>>> > > >
> > >>>> > > > With best regards,
> > >>>> > > > Soumyadeep Mukhopadhyay.
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
> > >>>> askar.mulin@gmail.com>
> > >>>> > > > wrote:
> > >>>> > > >
> > >>>> > > > > Hello again!
> > >>>> > > > > I have prepared this gist to perhaps help you understand how
> > to
> > >>>> > > traverse
> > >>>> > > > > through a Sql parse tree:
> > >>>> > > > >
> > >>>>
> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> > >>>> > > > >
> > >>>> > > > > I have implemented a Filterer class that returns false on
> any
> > >>>> queries
> > >>>> > > > which
> > >>>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all
> > >>>> other
> > >>>> > > > queries.
> > >>>> > > > > I have used Kotlin, so if you have difficulties
> understanding
> > >>>> it (as
> > >>>> > > > you're
> > >>>> > > > > using Scala), check out Kotlin's smart cast and nullability.
> > >>>> > > > >
> > >>>> > > > > Good luck and hopefully that was helpful,
> > >>>> > > > > Askar Bozcan
> > >>>> > > > >
> > >>>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> > >>>> > > > > soumyamyb95@gmail.com>
> > >>>> > > > > wrote:
> > >>>> > > > >
> > >>>> > > > > > Hi Askar,
> > >>>> > > > > >
> > >>>> > > > > > I have gone through your email a couple of times. I am
> > trying
> > >>>> to
> > >>>> > > > > > understand it bit by bit.
> > >>>> > > > > >
> > >>>> > > > > > I have a request, please feel free to say no, is there any
> > >>>> code
> > >>>> > base
> > >>>> > > > > where
> > >>>> > > > > > I can refer and understand how to implement my own filter
> > >>>> rule?
> > >>>> > > > > > What I realised I asked wrong is I wanted to implement
> > >>>> "(HAVING
> > >>>> > > > COUNT(*)
> > >>>> > > > > > > 100)" as a global rule, if there are any aggregation
> > >>>> queries.
> > >>>> > > > > >
> > >>>> > > > > > It is a bit difficult for me to understand where I should
> > >>>> start.
> > >>>> > > > > > For example, if I follow this
> > >>>> > > > > https://github.com/zabetak/calcite-tutorial where
> > >>>> > > > > > shall I start looking at?
> > >>>> > > > > > In the below image I feel like some kind of comparison is
> > >>>> going on,
> > >>>> > > > > should
> > >>>> > > > > > I devise my code like this? (in order to implement
> something
> > >>>> like
> > >>>> > > > > "(HAVING
> > >>>> > > > > > COUNT(*) > 100)")
> > >>>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> > >>>> > > > > >
> > >>>> > > > > > Also is there any documentation I can go through regarding
> > >>>> how I
> > >>>> > can
> > >>>> > > > > > traverse through the AST?
> > >>>> > > > > > Probably I am not looking in the right places but so far I
> > >>>> could
> > >>>> > only
> > >>>> > > > go
> > >>>> > > > > > through Tab9 code examples (or the documentation provided
> by
> > >>>> the
> > >>>> > > > Calcite
> > >>>> > > > > > website) and things did not seem to be clear.
> > >>>> > > > > > I realise it is probably a lot to ask, so whatever you
> share
> > >>>> will
> > >>>> > be
> > >>>> > > a
> > >>>> > > > > lot
> > >>>> > > > > > of help for me.
> > >>>> > > > > >
> > >>>> > > > > > Thanks again for your time, patience and help!
> > >>>> > > > > >
> > >>>> > > > > > With regards,
> > >>>> > > > > > Soumyadeep Mukhopadhyay.
> > >>>> > > > > >
> > >>>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
> > >>>> askar.mulin@gmail.com
> > >>>> > >
> > >>>> > > > > wrote:
> > >>>> > > > > >
> > >>>> > > > > >> Hey,
> > >>>> > > > > >> You can use Planner.reset(). Note that it has mutable
> state
> > >>>> > inside,
> > >>>> > > so
> > >>>> > > > > do
> > >>>> > > > > >> not reuse the same planner instance in any
> multi-threading
> > >>>> > > > environment.
> > >>>> > > > > >>
> > >>>> > > > > >> (I am assuming you have access to table metadata so that
> > you
> > >>>> will
> > >>>> > be
> > >>>> > > > > able
> > >>>> > > > > >> to convert your SqlNode tree into RelNode tree,
> relational
> > >>>> > > expression
> > >>>> > > > > >> tree)
> > >>>> > > > > >> - Only return results above a certain threshold when
> using
> > >>>> GROUP
> > >>>> > BY,
> > >>>> > > > for
> > >>>> > > > > >> example (HAVING COUNT(col1) > 100).
> > >>>> > > > > >>
> > >>>> > > > > >> I'm not quite sure I understand this question is HAVING
> > part
> > >>>> of
> > >>>> > the
> > >>>> > > > > query?
> > >>>> > > > > >>
> > >>>> > > > > >> - Restrict the column on which joins can happen, or else
> > >>>> throw an
> > >>>> > > > error
> > >>>> > > > > >> (almost like analysisException in Spark)
> > >>>> > > > > >>
> > >>>> > > > > >> Do you have access to table/schema metadata?
> > >>>> > > > > >> If you do:
> > >>>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a
> logical
> > >>>> > > relational
> > >>>> > > > > >> tree
> > >>>> > > > > >> (RelNode).
> > >>>> > > > > >> Watch this tutorial by Stamatis:
> > >>>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> > >>>> > > > > >> It will explain way better the usage than I can in an
> > email.
> > >>>> (And
> > >>>> > it
> > >>>> > > > > what
> > >>>> > > > > >> introduced me to Calcite's basics :))
> > >>>> > > > > >> 2) Traverse your relational tree by implementing
> RelShuttle
> > >>>> > > > > >> <
> > >>>> > > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> > >>>> > > > > >> >.
> > >>>> > > > > >> (Look up Visitor pattern to better understand how this
> > >>>> recursive
> > >>>> > > > > traversal
> > >>>> > > > > >> works)
> > >>>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> > >>>> > > > > >> 4) When you encounter LogicalJoin, traverse its children
> > with
> > >>>> > > getLeft
> > >>>> > > > > and
> > >>>> > > > > >> getRight
> > >>>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
> > >>>> > RelOptTable
> > >>>> > > > > with
> > >>>> > > > > >> getTable, and then RelOptTable.getRowType() to find the
> > >>>> fields.
> > >>>> > > > > >> (Not 100% about step 5, but should be close to it)
> > >>>> > > > > >>
> > >>>> > > > > >>
> > >>>> > > > > >> - Restrict the columns that can be passed in a select
> > >>>> statement or
> > >>>> > > > else
> > >>>> > > > > >> throw an error (like in the point above)
> > >>>> > > > > >> Same logic as above. But instead of TableScan, look at
> the
> > >>>> root
> > >>>> > > > > >> LogicalProject.
> > >>>> > > > > >>
> > >>>> > > > > >> All of the above should be doable with just the parse
> tree
> > >>>> > (SqlNode)
> > >>>> > > > and
> > >>>> > > > > >> without access to table metadata using SqlShuttle;
> however,
> > >>>> it's
> > >>>> > > > easier
> > >>>> > > > > >> and
> > >>>> > > > > >> less error-prone with relational tree IMO.
> > >>>> > > > > >>
> > >>>> > > > > >> - Askar
> > >>>> > > > > >>
> > >>>> > > > > >>
> > >>>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> > >>>> > > > > >> soumyamyb95@gmail.com>
> > >>>> > > > > >> wrote:
> > >>>> > > > > >>
> > >>>> > > > > >> > Hey Askar,
> > >>>> > > > > >> >
> > >>>> > > > > >> > It worked exactly how you suggested.
> > >>>> > > > > >> >
> > >>>> > > > > >> > ```
> > >>>> > > > > >> >
> > >>>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns
> > FROM
> > >>>> > > > table_name
> > >>>> > > > > >> > WHERE predicate_column = 'predicate_value'"
> > >>>> > > > > >> > private val frameworkConfig: FrameworkConfig =
> > >>>> > > > > >> > Frameworks.newConfigBuilder.build()
> > >>>> > > > > >> > private val planner: Planner =
> > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > >>>> > > > > >> > private val planner2: Planner =
> > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > >>>> > > > > >> > private val planner3: Planner =
> > >>>> > > > Frameworks.getPlanner(frameworkConfig)
> > >>>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> > >>>> > > > > >> > println(sqlNode.getKind)
> > >>>> > > > > >> > private val sqlSelectStmt: SqlSelect =
> > >>>> > > > sqlNode.asInstanceOf[SqlSelect]
> > >>>> > > > > >> >
> > >>>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
> > >>>> > > > > >> > private val selectList =
> > >>>> > > > > >> >
> > >>>> > > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > >>>> > > > > >> > private val setFromTableQuery = "SELECT employee"
> > >>>> > > > > >> > private val fromTable =
> > >>>> > > > > >> >
> > >>>> > > > >
> > >>>> >
> > >>>>
> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> > >>>> > > > > >> >
> > >>>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
> > >>>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
> > >>>> > > > > >> > private val finalQuery =
> > >>>> sqlSelectStmt.asInstanceOf[SqlNode]
> > >>>> > > > > >> >
> > >>>> println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> > >>>> > > > > >> >
> > >>>> > > > > >> > ```
> > >>>> > > > > >> >
> > >>>> > > > > >> > The only caveat I see is the need for a new planner for
> > >>>> every
> > >>>> > new
> > >>>> > > > > query.
> > >>>> > > > > >> > Should I do something else or is this expected?
> > >>>> > > > > >> >
> > >>>> > > > > >> > On a different note, I wanted to ask about how I can
> > write
> > >>>> my
> > >>>> > own
> > >>>> > > > > >> rules. Is
> > >>>> > > > > >> > it possible to enforce following rules:
> > >>>> > > > > >> > - Only return results above a certain threshold when
> > using
> > >>>> GROUP
> > >>>> > > BY,
> > >>>> > > > > for
> > >>>> > > > > >> > example (HAVING COUNT(col1) > 100).
> > >>>> > > > > >> > - Restrict the column on which joins can happen, or
> else
> > >>>> throw
> > >>>> > an
> > >>>> > > > > error
> > >>>> > > > > >> > (almost like analysisException in Spark)
> > >>>> > > > > >> > - Restrict the columns that can be passed in a select
> > >>>> statement
> > >>>> > or
> > >>>> > > > > else
> > >>>> > > > > >> > throw an error (like in the point above)
> > >>>> > > > > >> >
> > >>>> > > > > >> > Not sure if it is feasible, please feel free to
> suggest.
> > :)
> > >>>> > Thanks
> > >>>> > > > > again
> > >>>> > > > > >> > for your time!
> > >>>> > > > > >> >
> > >>>> > > > > >> > With regards,
> > >>>> > > > > >> > Soumyadeep Mukhopadhyay.
> > >>>> > > > > >> >
> > >>>> > > > > >> >
> > >>>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
> > >>>> > askar.mulin@gmail.com
> > >>>> > > >
> > >>>> > > > > >> wrote:
> > >>>> > > > > >> >
> > >>>> > > > > >> > > Hey Soumyadeep,
> > >>>> > > > > >> > > I think that can work with a few caveats.
> > >>>> > > > > >> > > 0) Use the Planner from Frameworks
> > >>>> > > > > >> > > <
> > >>>> > > > > >> > >
> > >>>> > > > > >> >
> > >>>> > > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > >>>> > > > > >> > > >
> > >>>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a
> > root
> > >>>> > > SqlNode)
> > >>>> > > > > but
> > >>>> > > > > >> do
> > >>>> > > > > >> > > not validate it
> > >>>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
> > >>>> > > > > >> > > <
> > >>>> > > > > >> > >
> > >>>> > > > > >> >
> > >>>> > > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > >>>> > > > > >> > > >
> > >>>> > > > > >> > > (can
> > >>>> > > > > >> > > verify that it's a SELECT by using getKind())
> > >>>> > > > > >> > > 3) Use the setters from SqlSelect to modify the
> > children
> > >>>> of
> > >>>> > root
> > >>>> > > > > >> > SqlSelect
> > >>>> > > > > >> > > node.
> > >>>> > > > > >> > > 4) Unparse the root query back into a string
> > >>>> > > (SqlNode.unparse()).
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > Now the tricky part is 3), as you have to set proper
> > >>>> SqlNode
> > >>>> > > types
> > >>>> > > > > as
> > >>>> > > > > >> > > children of the SELECT node for column names, table
> > >>>> names and
> > >>>> > > for
> > >>>> > > > > >> WHERE
> > >>>> > > > > >> > > predicate. I don't remember them properly, but what
> you
> > >>>> can do
> > >>>> > > is
> > >>>> > > > > >> parse a
> > >>>> > > > > >> > > proper query (again, don't validate it as you'll then
> > >>>> need
> > >>>> > table
> > >>>> > > > > >> > metadata),
> > >>>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's
> > and
> > >>>> > > replace
> > >>>> > > > > them
> > >>>> > > > > >> > with
> > >>>> > > > > >> > > your replacements using setters of root SqlSelect
> node.
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > This should work, but I'm not 100% certain as I'm
> > unable
> > >>>> to
> > >>>> > > check
> > >>>> > > > > >> right
> > >>>> > > > > >> > > now.
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > PS: Take care about dialects. Dialects are used in
> > >>>> unparsing
> > >>>> > as
> > >>>> > > a
> > >>>> > > > > >> > > "configuration" of SqlWriter, and can unparse the
> > syntax
> > >>>> tree
> > >>>> > > > > >> differently
> > >>>> > > > > >> > > based on the dialect you have chosen.
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > Kind regards,
> > >>>> > > > > >> > > Askar Bozcan
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> > >>>> > > > > >> soumyamyb95@gmail.com>
> > >>>> > > > > >> > > wrote:
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > Hello All,
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > I have just heard of Apache Calcite and was exploring
> > the
> > >>>> > > > > >> possibilities.
> > >>>> > > > > >> > I
> > >>>> > > > > >> > > wish to achieve the following, and wanted to check if
> > my
> > >>>> hunch
> > >>>> > > is
> > >>>> > > > > >> > correct:
> > >>>> > > > > >> > > - Use a template to build SQL queries, like use
> > >>>> jinja-sql or
> > >>>> > > even
> > >>>> > > > > >> pebble
> > >>>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
> > >>>> template
> > >>>> > > and
> > >>>> > > > > >> then
> > >>>> > > > > >> > add
> > >>>> > > > > >> > > the necessary fields like table name and group by
> > fields
> > >>>> from
> > >>>> > an
> > >>>> > > > > >> input)
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?"
> > >>>> would be
> > >>>> > > > inside
> > >>>> > > > > a
> > >>>> > > > > >> > query
> > >>>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
> > >>>> > {{select_fields
> > >>>> > > |
> > >>>> > > > > >> > sqlsafe}}
> > >>>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
> > >>>> > sqlsafe}}"
> > >>>> > > > and
> > >>>> > > > > >> > values
> > >>>> > > > > >> > > like 'select_fields' would be substituted at run-time
> > >>>> from a
> > >>>> > > > > >> processing
> > >>>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but
> for
> > >>>> Java)
> > >>>> > and
> > >>>> > > > the
> > >>>> > > > > >> > output
> > >>>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT
> > >>>> NULL" with
> > >>>> > > > some
> > >>>> > > > > >> > dialect
> > >>>> > > > > >> > > (like Snowflake or Big-Query).
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > Is this possible? Any recommendations or suggestions
> > are
> > >>>> > > welcome.
> > >>>> > > > > >> Even if
> > >>>> > > > > >> > > the approach feels wrong please let me know. :)
> > >>>> > > > > >> > > Thank you for your time and consideration.
> > >>>> > > > > >> > >
> > >>>> > > > > >> > > With regards,
> > >>>> > > > > >> > > Soumyadeep Mukhopadhyay.
> > >>>> > > > > >> > >
> > >>>> > > > > >> >
> > >>>> > > > > >>
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: Template SQL

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.

Hello Askar,

Thank you again for the replies! Hope all is well! :)

To answer your questions:
Question: Wanting to use Calcite (Calcite-server in particular) as a "proxy
DB" to
change incoming queries to have HAVING before passing the query to an
underlying database?
Answer: Yes.

Question: Just want to use Calcite to parse a SELECT query, and if it
doesn't have
a HAVING query, modify it and use it again?
Answer: Yes.

Question: In either case, what's your use case?
Answer: I want to apply certain constraints on what the user can actually
query from my database. Let's say I am providing a dataset to a user who
belongs to a different organization and I only want to expose a schema that
I am comfortable with via views. Whenever he or she is going to issue a
query I would want that query to be parsed and validated against the set of
constraints or rules that is important to my organisation but at the
same time I would not want to trigger an error just because someone did not
add a having clause (an example), rather I would try to modify the query
itself and then push the final query.

Question: `Is there a way to generate the RelBuilder version of the same
Sql query?
(SqlToRelConverter?) What do you mean by that?`
Answer: I was hoping there is some API which would produce a version of the
SQL statement which is similar to how one writes queries via RelBuilder.
My thoughts behind this question was there would be some kind of an API
which would convert "SELECT c1 FROM tbl1" into something like
builder .scan("tbl1") .project(builder.field("c1"))
.build(); and vice versa. I don't have an exact use case for this just yet
but what I was imagining to do is use something similar as a template
(let's say as a FreeMarker template) and then chain these templates,
probably via CTEs, and then produce their results (as a SQL string). This
way the developer only needs to worry about how the query would look in
this fashion and not worry much about how to rewrite a no-sql engine based
query into SQL, but I think there may not be too many use cases for that.

I understand and appreciate why "apply a similar approach to a RelRule"
might be an anti-pattern, but, for the sake of trying it out, I want to see
if it is even possible. I know you said you are not too familiar but if you
have any roadmap regarding how to implement any planner rule, please feel
free to chime in!

The following has been on my mind for sometime now but I could not find a
close enough answer for this, please feel free to point to anyone, or not,
if you do not wish to answer this. :)

And lets say I have a table from Snowflake and one in BigQuery, if I try to
join them both using Apache Calcite (the way I envision this would happen
is create a view within Calcite first, for each of these tables, and then
do the join) where will the computation happen? If the answer is somewhat
like this, it would happen in memory and wherever your application is
running, then I might need to build a framework which has the capability to
process the join (maybe over millions of rows) and in a distributed manner
but also leverages the work that has gone into writing the rules in Apache
Calcite. So should I consider some other engine or platform that supports
Apache Calcite like Apache Beam maybe (seems to me like although they use
Calcite I may not be able to deploy my own rules in that engine) or
Hazelcast? I am not sure but I am open to any and all suggestions!

I can not thank you enough for taking the time to hear my concerns, go
through my code and provide feedback. I am more than grateful to you! :)

With regards,
Soumyadeep Mukhopadhyay.

On Sat, May 27, 2023 at 12:05 AM Askar Bozcan <as...@gmail.com> wrote:

> Hello again! Sorry for the late reply.
> Just to be on the same page, I have some questions because I am not 100%
> what you are trying to accomplish.
>
> Are you:
> a) Wanting to use Calcite (Calcite-server in particular) as a "proxy DB" to
> change incoming queries to have HAVING before passing the query to an
> underlying database?
> b) Just want to use Calcite to parse a SELECT query, and if it doesn't have
> a HAVING query, modify it and use it again?
>
> *In either case, what's your use case?*
>
> Regarding answers to some of your questions:
>
> Q: Is it not possible to add a node to a tree when the sql query has been
> parsed to RelNode after validation?
> A: You can, you can set the children of RelNode with
> RelNode.replaceInput (though
> the caveat here, is you're setting a whole sub-tree, so need to be careful
> there)
>
> Q: If I wish to apply a similar approach to a RelRule which I am trying to
> enforce, is that not expected? (an anti-pattern)
> A: Physical plan building is one part of Calcite I'm not too familiar with,
> however I'd argue that it's an antipattern, as those are used for physical
> planning. Physical plans are NOT meant to change the overall relation, and
> are for optimizing the query (adapting the query to run efficiently &
> correctly on different databases' different architectures)
>
> Q: Is there a way to generate the RelBuilder version of the same Sql query?
> (SqlToRelConverter?)
> A: What do you mean by that? RelBuilder is meant for programmatically
> building relational expressions (RelNode). This is just an alternative to
> SQL parsing.
>
> Regards,
> Askar
>
> On Thu, 25 May 2023 at 20:14, Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com>
> wrote:
>
> > Hey Askar,
> >
> > Hope you are doing well. If and when time permits please share your
> > feedback. It would be really helpful!
> >
> > Thanks and Regards,
> > Soumyadeep.
> >
> > On Wed, May 10, 2023 at 2:17 AM Soumyadeep Mukhopadhyay <
> > soumyamyb95@gmail.com> wrote:
> >
> >> Hey Askar,
> >>
> >> I am happy to inform that I could make a little more progress, and here
> >> is the gist -
> >>
> https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67
> >> .
> >>
> >> Although I am able to do this, I don't think I have reached my goal yet.
> >> A question that came to mind was this :
> >> - Is it not possible to add a node to a tree when the sql query has been
> >> parsed to RelNode after validation?
> >> - If I wish to apply a similar approach to a RelRule which I am trying
> to
> >> enforce, is that not expected? (an anti-pattern)
> >> - Is there a way to generate the RelBuilder version of the same Sql
> >> query? (SqlToRelConverter?)
> >>
> >> My approach in this gist :
> >> get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
> >> find SqlSelect (since there's getHaving and setHaving) ==> create a new
> >> dummy query and extract having from the query ==>
> >> set Having for the current query at hand via the dummy query.
> >> Note : For no particular reason I have used only visit by SqlCall, was
> >> trying to build up on what you shared and the other visit methods didn't
> >> seem to fit.
> >>
> >> The approach I have in mind for a RelRule :
> >> use ConverterRule ==> find the node under onMatch as Aggregate ==>
> >> then use call.transformTo to add a having clause to the existing
> >> aggregate node if there's no having clause present using the RelBuilder,
> >> but that does not seem feasible.
> >> (below is what I was thinking, original query : SELECT o_custkey,
> COUNT(*)
> >> AS C FROM orders GROUP BY o_custkey)
> >> [image: Screenshot 2023-05-10 at 2.09.41 AM.png]
> >>
> >> Maybe I am missing something. Please feel free to share your comments.
> :)
> >> Thank you again!
> >>
> >> With regards,
> >> Soumyadeep Mukhopadhyay.
> >>
> >> On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
> >> soumyamyb95@gmail.com> wrote:
> >>
> >>> Hey Askar,
> >>>
> >>> Thank you so much for your email. Please take as much time as you need.
> >>> I will keep trying in the meantime. :)
> >>>
> >>> With regards,
> >>> Soumyadeep Mukhopadhyay.
> >>>
> >>> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hey, sorry for the late reply but I had some stuff to do IRL & some
> >>>> deadlines I need to meet right now. I'll try to answer you as soon as
> I
> >>>> can, but this week doesn't seem possible.
> >>>>
> >>>> Keep the questions coming, however, if you haven't solved them
> yourself
> >>>> (and if you did, please share your solutions). I'm planning to write a
> >>>> sort-of "Comprehensive Calcite 101" documentation in the near future
> >>>> which
> >>>> I am going to be basing on my own experiences & questions in the
> >>>> community.
> >>>>
> >>>> Kind regards,
> >>>> Askar
> >>>>
> >>>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
> >>>> soumyamyb95@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Hello Askar,
> >>>> >
> >>>> > Thank you so much for guiding me, I have taken some inspiration from
> >>>> your
> >>>> > code and from Dremio's implementation and compiled this -
> >>>> >
> >>>>
> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
> >>>> > .
> >>>> >
> >>>> > I have a few follow-up questions, if I may, regarding what I have
> >>>> > implemented:
> >>>> >
> >>>> > - is it possible to implement this as a rule, probably a RelRule (I
> am
> >>>> > saying a RelRule because as far as I have understood RelRule is the
> >>>> way you
> >>>> > implement rules that you would like to enforce before the physical
> >>>> plan is
> >>>> > created)?
> >>>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
> >>>> > present, if not then add it in the query (probably transform it).*
> >>>> > *research done so far* : I probably need to create a RelRule and
> >>>> implement
> >>>> > the "matches" and "onMatch" methods. The "onMatch" method should
> >>>> transform
> >>>> > my rel or rex node using transformTo, probably.
> >>>> > Something like
> >>>> >
> >>>> >
> >>>>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
> >>>> > is
> >>>> > something which I might need to implement.
> >>>> > But a few methods are hard to grasp for me and they are:
> >>>> >
> >>>> >  - transformTo
> >>>> >  - RelRule.Config.withOperandFor()
> >>>> >
> >>>> > Could you please give me some insight on how I can understand their
> >>>> > structures?
> >>>> >
> >>>> > - is it possible to find whether a node (sql/rel) is GROUP BY or
> not?
> >>>> > - if there is an AND clause in WHERE, is there a way to parse all
> the
> >>>> > filter conditions separately? (using RexCall.getOperator did not
> work)
> >>>> > - is this (what I have done) only a hack or is there a better way to
> >>>> > implement this?
> >>>> >
> >>>> > I understand that my questions are probably not trivial, so I
> >>>> appreciate
> >>>> > that you are reading my emails and providing your valuable feedback.
> >>>> >
> >>>> > Thank you for your time and consideration. :)
> >>>> >
> >>>> > With regards,
> >>>> > Soumyadeep Mukhopadhyay.
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <
> askar.mulin@gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> > > Thank you for your kind words :)
> >>>> > >
> >>>> > > - Askar
> >>>> > >
> >>>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
> >>>> > soumyamyb95@gmail.com
> >>>> > > >
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Hello Askar,
> >>>> > > >
> >>>> > > > Thank you so much for taking the time to compile this for me.
> >>>> > > >
> >>>> > > > I shall be indebted to you for this effort. Please let me know
> if
> >>>> I can
> >>>> > > be
> >>>> > > > of any assistance ever to repay your kindness.
> >>>> > > >
> >>>> > > > I appreciate all your help and guidance. Thank you once again. I
> >>>> shall
> >>>> > go
> >>>> > > > through the same and will get back to you as soon as possible.
> >>>> Thank
> >>>> > you
> >>>> > > > again sir! :)
> >>>> > > >
> >>>> > > > With best regards,
> >>>> > > > Soumyadeep Mukhopadhyay.
> >>>> > > >
> >>>> > > >
> >>>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
> >>>> askar.mulin@gmail.com>
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > > > Hello again!
> >>>> > > > > I have prepared this gist to perhaps help you understand how
> to
> >>>> > > traverse
> >>>> > > > > through a Sql parse tree:
> >>>> > > > >
> >>>> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> >>>> > > > >
> >>>> > > > > I have implemented a Filterer class that returns false on any
> >>>> queries
> >>>> > > > which
> >>>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all
> >>>> other
> >>>> > > > queries.
> >>>> > > > > I have used Kotlin, so if you have difficulties understanding
> >>>> it (as
> >>>> > > > you're
> >>>> > > > > using Scala), check out Kotlin's smart cast and nullability.
> >>>> > > > >
> >>>> > > > > Good luck and hopefully that was helpful,
> >>>> > > > > Askar Bozcan
> >>>> > > > >
> >>>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> >>>> > > > > soumyamyb95@gmail.com>
> >>>> > > > > wrote:
> >>>> > > > >
> >>>> > > > > > Hi Askar,
> >>>> > > > > >
> >>>> > > > > > I have gone through your email a couple of times. I am
> trying
> >>>> to
> >>>> > > > > > understand it bit by bit.
> >>>> > > > > >
> >>>> > > > > > I have a request, please feel free to say no, is there any
> >>>> code
> >>>> > base
> >>>> > > > > where
> >>>> > > > > > I can refer and understand how to implement my own filter
> >>>> rule?
> >>>> > > > > > What I realised I asked wrong is I wanted to implement
> >>>> "(HAVING
> >>>> > > > COUNT(*)
> >>>> > > > > > > 100)" as a global rule, if there are any aggregation
> >>>> queries.
> >>>> > > > > >
> >>>> > > > > > It is a bit difficult for me to understand where I should
> >>>> start.
> >>>> > > > > > For example, if I follow this
> >>>> > > > > https://github.com/zabetak/calcite-tutorial where
> >>>> > > > > > shall I start looking at?
> >>>> > > > > > In the below image I feel like some kind of comparison is
> >>>> going on,
> >>>> > > > > should
> >>>> > > > > > I devise my code like this? (in order to implement something
> >>>> like
> >>>> > > > > "(HAVING
> >>>> > > > > > COUNT(*) > 100)")
> >>>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> >>>> > > > > >
> >>>> > > > > > Also is there any documentation I can go through regarding
> >>>> how I
> >>>> > can
> >>>> > > > > > traverse through the AST?
> >>>> > > > > > Probably I am not looking in the right places but so far I
> >>>> could
> >>>> > only
> >>>> > > > go
> >>>> > > > > > through Tab9 code examples (or the documentation provided by
> >>>> the
> >>>> > > > Calcite
> >>>> > > > > > website) and things did not seem to be clear.
> >>>> > > > > > I realise it is probably a lot to ask, so whatever you share
> >>>> will
> >>>> > be
> >>>> > > a
> >>>> > > > > lot
> >>>> > > > > > of help for me.
> >>>> > > > > >
> >>>> > > > > > Thanks again for your time, patience and help!
> >>>> > > > > >
> >>>> > > > > > With regards,
> >>>> > > > > > Soumyadeep Mukhopadhyay.
> >>>> > > > > >
> >>>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
> >>>> askar.mulin@gmail.com
> >>>> > >
> >>>> > > > > wrote:
> >>>> > > > > >
> >>>> > > > > >> Hey,
> >>>> > > > > >> You can use Planner.reset(). Note that it has mutable state
> >>>> > inside,
> >>>> > > so
> >>>> > > > > do
> >>>> > > > > >> not reuse the same planner instance in any multi-threading
> >>>> > > > environment.
> >>>> > > > > >>
> >>>> > > > > >> (I am assuming you have access to table metadata so that
> you
> >>>> will
> >>>> > be
> >>>> > > > > able
> >>>> > > > > >> to convert your SqlNode tree into RelNode tree, relational
> >>>> > > expression
> >>>> > > > > >> tree)
> >>>> > > > > >> - Only return results above a certain threshold when using
> >>>> GROUP
> >>>> > BY,
> >>>> > > > for
> >>>> > > > > >> example (HAVING COUNT(col1) > 100).
> >>>> > > > > >>
> >>>> > > > > >> I'm not quite sure I understand this question is HAVING
> part
> >>>> of
> >>>> > the
> >>>> > > > > query?
> >>>> > > > > >>
> >>>> > > > > >> - Restrict the column on which joins can happen, or else
> >>>> throw an
> >>>> > > > error
> >>>> > > > > >> (almost like analysisException in Spark)
> >>>> > > > > >>
> >>>> > > > > >> Do you have access to table/schema metadata?
> >>>> > > > > >> If you do:
> >>>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
> >>>> > > relational
> >>>> > > > > >> tree
> >>>> > > > > >> (RelNode).
> >>>> > > > > >> Watch this tutorial by Stamatis:
> >>>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> >>>> > > > > >> It will explain way better the usage than I can in an
> email.
> >>>> (And
> >>>> > it
> >>>> > > > > what
> >>>> > > > > >> introduced me to Calcite's basics :))
> >>>> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
> >>>> > > > > >> <
> >>>> > > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> >>>> > > > > >> >.
> >>>> > > > > >> (Look up Visitor pattern to better understand how this
> >>>> recursive
> >>>> > > > > traversal
> >>>> > > > > >> works)
> >>>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> >>>> > > > > >> 4) When you encounter LogicalJoin, traverse its children
> with
> >>>> > > getLeft
> >>>> > > > > and
> >>>> > > > > >> getRight
> >>>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
> >>>> > RelOptTable
> >>>> > > > > with
> >>>> > > > > >> getTable, and then RelOptTable.getRowType() to find the
> >>>> fields.
> >>>> > > > > >> (Not 100% about step 5, but should be close to it)
> >>>> > > > > >>
> >>>> > > > > >>
> >>>> > > > > >> - Restrict the columns that can be passed in a select
> >>>> statement or
> >>>> > > > else
> >>>> > > > > >> throw an error (like in the point above)
> >>>> > > > > >> Same logic as above. But instead of TableScan, look at the
> >>>> root
> >>>> > > > > >> LogicalProject.
> >>>> > > > > >>
> >>>> > > > > >> All of the above should be doable with just the parse tree
> >>>> > (SqlNode)
> >>>> > > > and
> >>>> > > > > >> without access to table metadata using SqlShuttle; however,
> >>>> it's
> >>>> > > > easier
> >>>> > > > > >> and
> >>>> > > > > >> less error-prone with relational tree IMO.
> >>>> > > > > >>
> >>>> > > > > >> - Askar
> >>>> > > > > >>
> >>>> > > > > >>
> >>>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> >>>> > > > > >> soumyamyb95@gmail.com>
> >>>> > > > > >> wrote:
> >>>> > > > > >>
> >>>> > > > > >> > Hey Askar,
> >>>> > > > > >> >
> >>>> > > > > >> > It worked exactly how you suggested.
> >>>> > > > > >> >
> >>>> > > > > >> > ```
> >>>> > > > > >> >
> >>>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns
> FROM
> >>>> > > > table_name
> >>>> > > > > >> > WHERE predicate_column = 'predicate_value'"
> >>>> > > > > >> > private val frameworkConfig: FrameworkConfig =
> >>>> > > > > >> > Frameworks.newConfigBuilder.build()
> >>>> > > > > >> > private val planner: Planner =
> >>>> > > > Frameworks.getPlanner(frameworkConfig)
> >>>> > > > > >> > private val planner2: Planner =
> >>>> > > > Frameworks.getPlanner(frameworkConfig)
> >>>> > > > > >> > private val planner3: Planner =
> >>>> > > > Frameworks.getPlanner(frameworkConfig)
> >>>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> >>>> > > > > >> > println(sqlNode.getKind)
> >>>> > > > > >> > private val sqlSelectStmt: SqlSelect =
> >>>> > > > sqlNode.asInstanceOf[SqlSelect]
> >>>> > > > > >> >
> >>>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
> >>>> > > > > >> > private val selectList =
> >>>> > > > > >> >
> >>>> > > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> >>>> > > > > >> > private val setFromTableQuery = "SELECT employee"
> >>>> > > > > >> > private val fromTable =
> >>>> > > > > >> >
> >>>> > > > >
> >>>> >
> >>>>
> planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> >>>> > > > > >> >
> >>>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
> >>>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
> >>>> > > > > >> > private val finalQuery =
> >>>> sqlSelectStmt.asInstanceOf[SqlNode]
> >>>> > > > > >> >
> >>>> println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> >>>> > > > > >> >
> >>>> > > > > >> > ```
> >>>> > > > > >> >
> >>>> > > > > >> > The only caveat I see is the need for a new planner for
> >>>> every
> >>>> > new
> >>>> > > > > query.
> >>>> > > > > >> > Should I do something else or is this expected?
> >>>> > > > > >> >
> >>>> > > > > >> > On a different note, I wanted to ask about how I can
> write
> >>>> my
> >>>> > own
> >>>> > > > > >> rules. Is
> >>>> > > > > >> > it possible to enforce following rules:
> >>>> > > > > >> > - Only return results above a certain threshold when
> using
> >>>> GROUP
> >>>> > > BY,
> >>>> > > > > for
> >>>> > > > > >> > example (HAVING COUNT(col1) > 100).
> >>>> > > > > >> > - Restrict the column on which joins can happen, or else
> >>>> throw
> >>>> > an
> >>>> > > > > error
> >>>> > > > > >> > (almost like analysisException in Spark)
> >>>> > > > > >> > - Restrict the columns that can be passed in a select
> >>>> statement
> >>>> > or
> >>>> > > > > else
> >>>> > > > > >> > throw an error (like in the point above)
> >>>> > > > > >> >
> >>>> > > > > >> > Not sure if it is feasible, please feel free to suggest.
> :)
> >>>> > Thanks
> >>>> > > > > again
> >>>> > > > > >> > for your time!
> >>>> > > > > >> >
> >>>> > > > > >> > With regards,
> >>>> > > > > >> > Soumyadeep Mukhopadhyay.
> >>>> > > > > >> >
> >>>> > > > > >> >
> >>>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
> >>>> > askar.mulin@gmail.com
> >>>> > > >
> >>>> > > > > >> wrote:
> >>>> > > > > >> >
> >>>> > > > > >> > > Hey Soumyadeep,
> >>>> > > > > >> > > I think that can work with a few caveats.
> >>>> > > > > >> > > 0) Use the Planner from Frameworks
> >>>> > > > > >> > > <
> >>>> > > > > >> > >
> >>>> > > > > >> >
> >>>> > > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> >>>> > > > > >> > > >
> >>>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a
> root
> >>>> > > SqlNode)
> >>>> > > > > but
> >>>> > > > > >> do
> >>>> > > > > >> > > not validate it
> >>>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
> >>>> > > > > >> > > <
> >>>> > > > > >> > >
> >>>> > > > > >> >
> >>>> > > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> >>>> > > > > >> > > >
> >>>> > > > > >> > > (can
> >>>> > > > > >> > > verify that it's a SELECT by using getKind())
> >>>> > > > > >> > > 3) Use the setters from SqlSelect to modify the
> children
> >>>> of
> >>>> > root
> >>>> > > > > >> > SqlSelect
> >>>> > > > > >> > > node.
> >>>> > > > > >> > > 4) Unparse the root query back into a string
> >>>> > > (SqlNode.unparse()).
> >>>> > > > > >> > >
> >>>> > > > > >> > > Now the tricky part is 3), as you have to set proper
> >>>> SqlNode
> >>>> > > types
> >>>> > > > > as
> >>>> > > > > >> > > children of the SELECT node for column names, table
> >>>> names and
> >>>> > > for
> >>>> > > > > >> WHERE
> >>>> > > > > >> > > predicate. I don't remember them properly, but what you
> >>>> can do
> >>>> > > is
> >>>> > > > > >> parse a
> >>>> > > > > >> > > proper query (again, don't validate it as you'll then
> >>>> need
> >>>> > table
> >>>> > > > > >> > metadata),
> >>>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's
> and
> >>>> > > replace
> >>>> > > > > them
> >>>> > > > > >> > with
> >>>> > > > > >> > > your replacements using setters of root SqlSelect node.
> >>>> > > > > >> > >
> >>>> > > > > >> > > This should work, but I'm not 100% certain as I'm
> unable
> >>>> to
> >>>> > > check
> >>>> > > > > >> right
> >>>> > > > > >> > > now.
> >>>> > > > > >> > >
> >>>> > > > > >> > > PS: Take care about dialects. Dialects are used in
> >>>> unparsing
> >>>> > as
> >>>> > > a
> >>>> > > > > >> > > "configuration" of SqlWriter, and can unparse the
> syntax
> >>>> tree
> >>>> > > > > >> differently
> >>>> > > > > >> > > based on the dialect you have chosen.
> >>>> > > > > >> > >
> >>>> > > > > >> > > Kind regards,
> >>>> > > > > >> > > Askar Bozcan
> >>>> > > > > >> > >
> >>>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> >>>> > > > > >> soumyamyb95@gmail.com>
> >>>> > > > > >> > > wrote:
> >>>> > > > > >> > >
> >>>> > > > > >> > > Hello All,
> >>>> > > > > >> > >
> >>>> > > > > >> > > I have just heard of Apache Calcite and was exploring
> the
> >>>> > > > > >> possibilities.
> >>>> > > > > >> > I
> >>>> > > > > >> > > wish to achieve the following, and wanted to check if
> my
> >>>> hunch
> >>>> > > is
> >>>> > > > > >> > correct:
> >>>> > > > > >> > > - Use a template to build SQL queries, like use
> >>>> jinja-sql or
> >>>> > > even
> >>>> > > > > >> pebble
> >>>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
> >>>> template
> >>>> > > and
> >>>> > > > > >> then
> >>>> > > > > >> > add
> >>>> > > > > >> > > the necessary fields like table name and group by
> fields
> >>>> from
> >>>> > an
> >>>> > > > > >> input)
> >>>> > > > > >> > >
> >>>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?"
> >>>> would be
> >>>> > > > inside
> >>>> > > > > a
> >>>> > > > > >> > query
> >>>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
> >>>> > {{select_fields
> >>>> > > |
> >>>> > > > > >> > sqlsafe}}
> >>>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
> >>>> > sqlsafe}}"
> >>>> > > > and
> >>>> > > > > >> > values
> >>>> > > > > >> > > like 'select_fields' would be substituted at run-time
> >>>> from a
> >>>> > > > > >> processing
> >>>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for
> >>>> Java)
> >>>> > and
> >>>> > > > the
> >>>> > > > > >> > output
> >>>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT
> >>>> NULL" with
> >>>> > > > some
> >>>> > > > > >> > dialect
> >>>> > > > > >> > > (like Snowflake or Big-Query).
> >>>> > > > > >> > >
> >>>> > > > > >> > > Is this possible? Any recommendations or suggestions
> are
> >>>> > > welcome.
> >>>> > > > > >> Even if
> >>>> > > > > >> > > the approach feels wrong please let me know. :)
> >>>> > > > > >> > > Thank you for your time and consideration.
> >>>> > > > > >> > >
> >>>> > > > > >> > > With regards,
> >>>> > > > > >> > > Soumyadeep Mukhopadhyay.
> >>>> > > > > >> > >
> >>>> > > > > >> >
> >>>> > > > > >>
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: Template SQL

Posted by Askar Bozcan <as...@gmail.com>.

Hello again! Sorry for the late reply.
Just to be on the same page, I have some questions because I am not 100%
what you are trying to accomplish.

Are you:
a) Wanting to use Calcite (Calcite-server in particular) as a "proxy DB" to
change incoming queries to have HAVING before passing the query to an
underlying database?
b) Just want to use Calcite to parse a SELECT query, and if it doesn't have
a HAVING query, modify it and use it again?

*In either case, what's your use case?*

Regarding answers to some of your questions:

Q: Is it not possible to add a node to a tree when the sql query has been
parsed to RelNode after validation?
A: You can, you can set the children of RelNode with
RelNode.replaceInput (though
the caveat here, is you're setting a whole sub-tree, so need to be careful
there)

Q: If I wish to apply a similar approach to a RelRule which I am trying to
enforce, is that not expected? (an anti-pattern)
A: Physical plan building is one part of Calcite I'm not too familiar with,
however I'd argue that it's an antipattern, as those are used for physical
planning. Physical plans are NOT meant to change the overall relation, and
are for optimizing the query (adapting the query to run efficiently &
correctly on different databases' different architectures)

Q: Is there a way to generate the RelBuilder version of the same Sql query?
(SqlToRelConverter?)
A: What do you mean by that? RelBuilder is meant for programmatically
building relational expressions (RelNode). This is just an alternative to
SQL parsing.

Regards,
Askar

On Thu, 25 May 2023 at 20:14, Soumyadeep Mukhopadhyay <so...@gmail.com>
wrote:

> Hey Askar,
>
> Hope you are doing well. If and when time permits please share your
> feedback. It would be really helpful!
>
> Thanks and Regards,
> Soumyadeep.
>
> On Wed, May 10, 2023 at 2:17 AM Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com> wrote:
>
>> Hey Askar,
>>
>> I am happy to inform that I could make a little more progress, and here
>> is the gist -
>> https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67
>> .
>>
>> Although I am able to do this, I don't think I have reached my goal yet.
>> A question that came to mind was this :
>> - Is it not possible to add a node to a tree when the sql query has been
>> parsed to RelNode after validation?
>> - If I wish to apply a similar approach to a RelRule which I am trying to
>> enforce, is that not expected? (an anti-pattern)
>> - Is there a way to generate the RelBuilder version of the same Sql
>> query? (SqlToRelConverter?)
>>
>> My approach in this gist :
>> get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
>> find SqlSelect (since there's getHaving and setHaving) ==> create a new
>> dummy query and extract having from the query ==>
>> set Having for the current query at hand via the dummy query.
>> Note : For no particular reason I have used only visit by SqlCall, was
>> trying to build up on what you shared and the other visit methods didn't
>> seem to fit.
>>
>> The approach I have in mind for a RelRule :
>> use ConverterRule ==> find the node under onMatch as Aggregate ==>
>> then use call.transformTo to add a having clause to the existing
>> aggregate node if there's no having clause present using the RelBuilder,
>> but that does not seem feasible.
>> (below is what I was thinking, original query : SELECT o_custkey, COUNT(*)
>> AS C FROM orders GROUP BY o_custkey)
>> [image: Screenshot 2023-05-10 at 2.09.41 AM.png]
>>
>> Maybe I am missing something. Please feel free to share your comments. :)
>> Thank you again!
>>
>> With regards,
>> Soumyadeep Mukhopadhyay.
>>
>> On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com> wrote:
>>
>>> Hey Askar,
>>>
>>> Thank you so much for your email. Please take as much time as you need.
>>> I will keep trying in the meantime. :)
>>>
>>> With regards,
>>> Soumyadeep Mukhopadhyay.
>>>
>>> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com>
>>> wrote:
>>>
>>>> Hey, sorry for the late reply but I had some stuff to do IRL & some
>>>> deadlines I need to meet right now. I'll try to answer you as soon as I
>>>> can, but this week doesn't seem possible.
>>>>
>>>> Keep the questions coming, however, if you haven't solved them yourself
>>>> (and if you did, please share your solutions). I'm planning to write a
>>>> sort-of "Comprehensive Calcite 101" documentation in the near future
>>>> which
>>>> I am going to be basing on my own experiences & questions in the
>>>> community.
>>>>
>>>> Kind regards,
>>>> Askar
>>>>
>>>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
>>>> soumyamyb95@gmail.com>
>>>> wrote:
>>>>
>>>> > Hello Askar,
>>>> >
>>>> > Thank you so much for guiding me, I have taken some inspiration from
>>>> your
>>>> > code and from Dremio's implementation and compiled this -
>>>> >
>>>> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
>>>> > .
>>>> >
>>>> > I have a few follow-up questions, if I may, regarding what I have
>>>> > implemented:
>>>> >
>>>> > - is it possible to implement this as a rule, probably a RelRule (I am
>>>> > saying a RelRule because as far as I have understood RelRule is the
>>>> way you
>>>> > implement rules that you would like to enforce before the physical
>>>> plan is
>>>> > created)?
>>>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
>>>> > present, if not then add it in the query (probably transform it).*
>>>> > *research done so far* : I probably need to create a RelRule and
>>>> implement
>>>> > the "matches" and "onMatch" methods. The "onMatch" method should
>>>> transform
>>>> > my rel or rex node using transformTo, probably.
>>>> > Something like
>>>> >
>>>> >
>>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
>>>> > is
>>>> > something which I might need to implement.
>>>> > But a few methods are hard to grasp for me and they are:
>>>> >
>>>> >  - transformTo
>>>> >  - RelRule.Config.withOperandFor()
>>>> >
>>>> > Could you please give me some insight on how I can understand their
>>>> > structures?
>>>> >
>>>> > - is it possible to find whether a node (sql/rel) is GROUP BY or not?
>>>> > - if there is an AND clause in WHERE, is there a way to parse all the
>>>> > filter conditions separately? (using RexCall.getOperator did not work)
>>>> > - is this (what I have done) only a hack or is there a better way to
>>>> > implement this?
>>>> >
>>>> > I understand that my questions are probably not trivial, so I
>>>> appreciate
>>>> > that you are reading my emails and providing your valuable feedback.
>>>> >
>>>> > Thank you for your time and consideration. :)
>>>> >
>>>> > With regards,
>>>> > Soumyadeep Mukhopadhyay.
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <as...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Thank you for your kind words :)
>>>> > >
>>>> > > - Askar
>>>> > >
>>>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
>>>> > soumyamyb95@gmail.com
>>>> > > >
>>>> > > wrote:
>>>> > >
>>>> > > > Hello Askar,
>>>> > > >
>>>> > > > Thank you so much for taking the time to compile this for me.
>>>> > > >
>>>> > > > I shall be indebted to you for this effort. Please let me know if
>>>> I can
>>>> > > be
>>>> > > > of any assistance ever to repay your kindness.
>>>> > > >
>>>> > > > I appreciate all your help and guidance. Thank you once again. I
>>>> shall
>>>> > go
>>>> > > > through the same and will get back to you as soon as possible.
>>>> Thank
>>>> > you
>>>> > > > again sir! :)
>>>> > > >
>>>> > > > With best regards,
>>>> > > > Soumyadeep Mukhopadhyay.
>>>> > > >
>>>> > > >
>>>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
>>>> askar.mulin@gmail.com>
>>>> > > > wrote:
>>>> > > >
>>>> > > > > Hello again!
>>>> > > > > I have prepared this gist to perhaps help you understand how to
>>>> > > traverse
>>>> > > > > through a Sql parse tree:
>>>> > > > >
>>>> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
>>>> > > > >
>>>> > > > > I have implemented a Filterer class that returns false on any
>>>> queries
>>>> > > > which
>>>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all
>>>> other
>>>> > > > queries.
>>>> > > > > I have used Kotlin, so if you have difficulties understanding
>>>> it (as
>>>> > > > you're
>>>> > > > > using Scala), check out Kotlin's smart cast and nullability.
>>>> > > > >
>>>> > > > > Good luck and hopefully that was helpful,
>>>> > > > > Askar Bozcan
>>>> > > > >
>>>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
>>>> > > > > soumyamyb95@gmail.com>
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Hi Askar,
>>>> > > > > >
>>>> > > > > > I have gone through your email a couple of times. I am trying
>>>> to
>>>> > > > > > understand it bit by bit.
>>>> > > > > >
>>>> > > > > > I have a request, please feel free to say no, is there any
>>>> code
>>>> > base
>>>> > > > > where
>>>> > > > > > I can refer and understand how to implement my own filter
>>>> rule?
>>>> > > > > > What I realised I asked wrong is I wanted to implement
>>>> "(HAVING
>>>> > > > COUNT(*)
>>>> > > > > > > 100)" as a global rule, if there are any aggregation
>>>> queries.
>>>> > > > > >
>>>> > > > > > It is a bit difficult for me to understand where I should
>>>> start.
>>>> > > > > > For example, if I follow this
>>>> > > > > https://github.com/zabetak/calcite-tutorial where
>>>> > > > > > shall I start looking at?
>>>> > > > > > In the below image I feel like some kind of comparison is
>>>> going on,
>>>> > > > > should
>>>> > > > > > I devise my code like this? (in order to implement something
>>>> like
>>>> > > > > "(HAVING
>>>> > > > > > COUNT(*) > 100)")
>>>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
>>>> > > > > >
>>>> > > > > > Also is there any documentation I can go through regarding
>>>> how I
>>>> > can
>>>> > > > > > traverse through the AST?
>>>> > > > > > Probably I am not looking in the right places but so far I
>>>> could
>>>> > only
>>>> > > > go
>>>> > > > > > through Tab9 code examples (or the documentation provided by
>>>> the
>>>> > > > Calcite
>>>> > > > > > website) and things did not seem to be clear.
>>>> > > > > > I realise it is probably a lot to ask, so whatever you share
>>>> will
>>>> > be
>>>> > > a
>>>> > > > > lot
>>>> > > > > > of help for me.
>>>> > > > > >
>>>> > > > > > Thanks again for your time, patience and help!
>>>> > > > > >
>>>> > > > > > With regards,
>>>> > > > > > Soumyadeep Mukhopadhyay.
>>>> > > > > >
>>>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
>>>> askar.mulin@gmail.com
>>>> > >
>>>> > > > > wrote:
>>>> > > > > >
>>>> > > > > >> Hey,
>>>> > > > > >> You can use Planner.reset(). Note that it has mutable state
>>>> > inside,
>>>> > > so
>>>> > > > > do
>>>> > > > > >> not reuse the same planner instance in any multi-threading
>>>> > > > environment.
>>>> > > > > >>
>>>> > > > > >> (I am assuming you have access to table metadata so that you
>>>> will
>>>> > be
>>>> > > > > able
>>>> > > > > >> to convert your SqlNode tree into RelNode tree, relational
>>>> > > expression
>>>> > > > > >> tree)
>>>> > > > > >> - Only return results above a certain threshold when using
>>>> GROUP
>>>> > BY,
>>>> > > > for
>>>> > > > > >> example (HAVING COUNT(col1) > 100).
>>>> > > > > >>
>>>> > > > > >> I'm not quite sure I understand this question is HAVING part
>>>> of
>>>> > the
>>>> > > > > query?
>>>> > > > > >>
>>>> > > > > >> - Restrict the column on which joins can happen, or else
>>>> throw an
>>>> > > > error
>>>> > > > > >> (almost like analysisException in Spark)
>>>> > > > > >>
>>>> > > > > >> Do you have access to table/schema metadata?
>>>> > > > > >> If you do:
>>>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
>>>> > > relational
>>>> > > > > >> tree
>>>> > > > > >> (RelNode).
>>>> > > > > >> Watch this tutorial by Stamatis:
>>>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
>>>> > > > > >> It will explain way better the usage than I can in an email.
>>>> (And
>>>> > it
>>>> > > > > what
>>>> > > > > >> introduced me to Calcite's basics :))
>>>> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
>>>> > > > > >> <
>>>> > > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
>>>> > > > > >> >.
>>>> > > > > >> (Look up Visitor pattern to better understand how this
>>>> recursive
>>>> > > > > traversal
>>>> > > > > >> works)
>>>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
>>>> > > > > >> 4) When you encounter LogicalJoin, traverse its children with
>>>> > > getLeft
>>>> > > > > and
>>>> > > > > >> getRight
>>>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
>>>> > RelOptTable
>>>> > > > > with
>>>> > > > > >> getTable, and then RelOptTable.getRowType() to find the
>>>> fields.
>>>> > > > > >> (Not 100% about step 5, but should be close to it)
>>>> > > > > >>
>>>> > > > > >>
>>>> > > > > >> - Restrict the columns that can be passed in a select
>>>> statement or
>>>> > > > else
>>>> > > > > >> throw an error (like in the point above)
>>>> > > > > >> Same logic as above. But instead of TableScan, look at the
>>>> root
>>>> > > > > >> LogicalProject.
>>>> > > > > >>
>>>> > > > > >> All of the above should be doable with just the parse tree
>>>> > (SqlNode)
>>>> > > > and
>>>> > > > > >> without access to table metadata using SqlShuttle; however,
>>>> it's
>>>> > > > easier
>>>> > > > > >> and
>>>> > > > > >> less error-prone with relational tree IMO.
>>>> > > > > >>
>>>> > > > > >> - Askar
>>>> > > > > >>
>>>> > > > > >>
>>>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
>>>> > > > > >> soumyamyb95@gmail.com>
>>>> > > > > >> wrote:
>>>> > > > > >>
>>>> > > > > >> > Hey Askar,
>>>> > > > > >> >
>>>> > > > > >> > It worked exactly how you suggested.
>>>> > > > > >> >
>>>> > > > > >> > ```
>>>> > > > > >> >
>>>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
>>>> > > > table_name
>>>> > > > > >> > WHERE predicate_column = 'predicate_value'"
>>>> > > > > >> > private val frameworkConfig: FrameworkConfig =
>>>> > > > > >> > Frameworks.newConfigBuilder.build()
>>>> > > > > >> > private val planner: Planner =
>>>> > > > Frameworks.getPlanner(frameworkConfig)
>>>> > > > > >> > private val planner2: Planner =
>>>> > > > Frameworks.getPlanner(frameworkConfig)
>>>> > > > > >> > private val planner3: Planner =
>>>> > > > Frameworks.getPlanner(frameworkConfig)
>>>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
>>>> > > > > >> > println(sqlNode.getKind)
>>>> > > > > >> > private val sqlSelectStmt: SqlSelect =
>>>> > > > sqlNode.asInstanceOf[SqlSelect]
>>>> > > > > >> >
>>>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
>>>> > > > > >> > private val selectList =
>>>> > > > > >> >
>>>> > > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
>>>> > > > > >> > private val setFromTableQuery = "SELECT employee"
>>>> > > > > >> > private val fromTable =
>>>> > > > > >> >
>>>> > > > >
>>>> >
>>>> planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
>>>> > > > > >> >
>>>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
>>>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
>>>> > > > > >> > private val finalQuery =
>>>> sqlSelectStmt.asInstanceOf[SqlNode]
>>>> > > > > >> >
>>>> println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
>>>> > > > > >> >
>>>> > > > > >> > ```
>>>> > > > > >> >
>>>> > > > > >> > The only caveat I see is the need for a new planner for
>>>> every
>>>> > new
>>>> > > > > query.
>>>> > > > > >> > Should I do something else or is this expected?
>>>> > > > > >> >
>>>> > > > > >> > On a different note, I wanted to ask about how I can write
>>>> my
>>>> > own
>>>> > > > > >> rules. Is
>>>> > > > > >> > it possible to enforce following rules:
>>>> > > > > >> > - Only return results above a certain threshold when using
>>>> GROUP
>>>> > > BY,
>>>> > > > > for
>>>> > > > > >> > example (HAVING COUNT(col1) > 100).
>>>> > > > > >> > - Restrict the column on which joins can happen, or else
>>>> throw
>>>> > an
>>>> > > > > error
>>>> > > > > >> > (almost like analysisException in Spark)
>>>> > > > > >> > - Restrict the columns that can be passed in a select
>>>> statement
>>>> > or
>>>> > > > > else
>>>> > > > > >> > throw an error (like in the point above)
>>>> > > > > >> >
>>>> > > > > >> > Not sure if it is feasible, please feel free to suggest. :)
>>>> > Thanks
>>>> > > > > again
>>>> > > > > >> > for your time!
>>>> > > > > >> >
>>>> > > > > >> > With regards,
>>>> > > > > >> > Soumyadeep Mukhopadhyay.
>>>> > > > > >> >
>>>> > > > > >> >
>>>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
>>>> > askar.mulin@gmail.com
>>>> > > >
>>>> > > > > >> wrote:
>>>> > > > > >> >
>>>> > > > > >> > > Hey Soumyadeep,
>>>> > > > > >> > > I think that can work with a few caveats.
>>>> > > > > >> > > 0) Use the Planner from Frameworks
>>>> > > > > >> > > <
>>>> > > > > >> > >
>>>> > > > > >> >
>>>> > > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
>>>> > > > > >> > > >
>>>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root
>>>> > > SqlNode)
>>>> > > > > but
>>>> > > > > >> do
>>>> > > > > >> > > not validate it
>>>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
>>>> > > > > >> > > <
>>>> > > > > >> > >
>>>> > > > > >> >
>>>> > > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
>>>> > > > > >> > > >
>>>> > > > > >> > > (can
>>>> > > > > >> > > verify that it's a SELECT by using getKind())
>>>> > > > > >> > > 3) Use the setters from SqlSelect to modify the children
>>>> of
>>>> > root
>>>> > > > > >> > SqlSelect
>>>> > > > > >> > > node.
>>>> > > > > >> > > 4) Unparse the root query back into a string
>>>> > > (SqlNode.unparse()).
>>>> > > > > >> > >
>>>> > > > > >> > > Now the tricky part is 3), as you have to set proper
>>>> SqlNode
>>>> > > types
>>>> > > > > as
>>>> > > > > >> > > children of the SELECT node for column names, table
>>>> names and
>>>> > > for
>>>> > > > > >> WHERE
>>>> > > > > >> > > predicate. I don't remember them properly, but what you
>>>> can do
>>>> > > is
>>>> > > > > >> parse a
>>>> > > > > >> > > proper query (again, don't validate it as you'll then
>>>> need
>>>> > table
>>>> > > > > >> > metadata),
>>>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and
>>>> > > replace
>>>> > > > > them
>>>> > > > > >> > with
>>>> > > > > >> > > your replacements using setters of root SqlSelect node.
>>>> > > > > >> > >
>>>> > > > > >> > > This should work, but I'm not 100% certain as I'm unable
>>>> to
>>>> > > check
>>>> > > > > >> right
>>>> > > > > >> > > now.
>>>> > > > > >> > >
>>>> > > > > >> > > PS: Take care about dialects. Dialects are used in
>>>> unparsing
>>>> > as
>>>> > > a
>>>> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax
>>>> tree
>>>> > > > > >> differently
>>>> > > > > >> > > based on the dialect you have chosen.
>>>> > > > > >> > >
>>>> > > > > >> > > Kind regards,
>>>> > > > > >> > > Askar Bozcan
>>>> > > > > >> > >
>>>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
>>>> > > > > >> soumyamyb95@gmail.com>
>>>> > > > > >> > > wrote:
>>>> > > > > >> > >
>>>> > > > > >> > > Hello All,
>>>> > > > > >> > >
>>>> > > > > >> > > I have just heard of Apache Calcite and was exploring the
>>>> > > > > >> possibilities.
>>>> > > > > >> > I
>>>> > > > > >> > > wish to achieve the following, and wanted to check if my
>>>> hunch
>>>> > > is
>>>> > > > > >> > correct:
>>>> > > > > >> > > - Use a template to build SQL queries, like use
>>>> jinja-sql or
>>>> > > even
>>>> > > > > >> pebble
>>>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
>>>> template
>>>> > > and
>>>> > > > > >> then
>>>> > > > > >> > add
>>>> > > > > >> > > the necessary fields like table name and group by fields
>>>> from
>>>> > an
>>>> > > > > >> input)
>>>> > > > > >> > >
>>>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?"
>>>> would be
>>>> > > > inside
>>>> > > > > a
>>>> > > > > >> > query
>>>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
>>>> > {{select_fields
>>>> > > |
>>>> > > > > >> > sqlsafe}}
>>>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
>>>> > sqlsafe}}"
>>>> > > > and
>>>> > > > > >> > values
>>>> > > > > >> > > like 'select_fields' would be substituted at run-time
>>>> from a
>>>> > > > > >> processing
>>>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for
>>>> Java)
>>>> > and
>>>> > > > the
>>>> > > > > >> > output
>>>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT
>>>> NULL" with
>>>> > > > some
>>>> > > > > >> > dialect
>>>> > > > > >> > > (like Snowflake or Big-Query).
>>>> > > > > >> > >
>>>> > > > > >> > > Is this possible? Any recommendations or suggestions are
>>>> > > welcome.
>>>> > > > > >> Even if
>>>> > > > > >> > > the approach feels wrong please let me know. :)
>>>> > > > > >> > > Thank you for your time and consideration.
>>>> > > > > >> > >
>>>> > > > > >> > > With regards,
>>>> > > > > >> > > Soumyadeep Mukhopadhyay.
>>>> > > > > >> > >
>>>> > > > > >> >
>>>> > > > > >>
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: Template SQL

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.

Hey Askar,

Hope you are doing well. If and when time permits please share your
feedback. It would be really helpful!

Thanks and Regards,
Soumyadeep.

On Wed, May 10, 2023 at 2:17 AM Soumyadeep Mukhopadhyay <
soumyamyb95@gmail.com> wrote:

> Hey Askar,
>
> I am happy to inform that I could make a little more progress, and here is
> the gist -
> https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67
> .
>
> Although I am able to do this, I don't think I have reached my goal yet.
> A question that came to mind was this :
> - Is it not possible to add a node to a tree when the sql query has been
> parsed to RelNode after validation?
> - If I wish to apply a similar approach to a RelRule which I am trying to
> enforce, is that not expected? (an anti-pattern)
> - Is there a way to generate the RelBuilder version of the same Sql query?
> (SqlToRelConverter?)
>
> My approach in this gist :
> get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
> find SqlSelect (since there's getHaving and setHaving) ==> create a new
> dummy query and extract having from the query ==>
> set Having for the current query at hand via the dummy query.
> Note : For no particular reason I have used only visit by SqlCall, was
> trying to build up on what you shared and the other visit methods didn't
> seem to fit.
>
> The approach I have in mind for a RelRule :
> use ConverterRule ==> find the node under onMatch as Aggregate ==>
> then use call.transformTo to add a having clause to the existing aggregate
> node if there's no having clause present using the RelBuilder, but that
> does not seem feasible.
> (below is what I was thinking, original query : SELECT o_custkey, COUNT(*)
> AS C FROM orders GROUP BY o_custkey)
> [image: Screenshot 2023-05-10 at 2.09.41 AM.png]
>
> Maybe I am missing something. Please feel free to share your comments. :)
> Thank you again!
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
> On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com> wrote:
>
>> Hey Askar,
>>
>> Thank you so much for your email. Please take as much time as you need. I
>> will keep trying in the meantime. :)
>>
>> With regards,
>> Soumyadeep Mukhopadhyay.
>>
>> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com>
>> wrote:
>>
>>> Hey, sorry for the late reply but I had some stuff to do IRL & some
>>> deadlines I need to meet right now. I'll try to answer you as soon as I
>>> can, but this week doesn't seem possible.
>>>
>>> Keep the questions coming, however, if you haven't solved them yourself
>>> (and if you did, please share your solutions). I'm planning to write a
>>> sort-of "Comprehensive Calcite 101" documentation in the near future
>>> which
>>> I am going to be basing on my own experiences & questions in the
>>> community.
>>>
>>> Kind regards,
>>> Askar
>>>
>>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
>>> soumyamyb95@gmail.com>
>>> wrote:
>>>
>>> > Hello Askar,
>>> >
>>> > Thank you so much for guiding me, I have taken some inspiration from
>>> your
>>> > code and from Dremio's implementation and compiled this -
>>> >
>>> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
>>> > .
>>> >
>>> > I have a few follow-up questions, if I may, regarding what I have
>>> > implemented:
>>> >
>>> > - is it possible to implement this as a rule, probably a RelRule (I am
>>> > saying a RelRule because as far as I have understood RelRule is the
>>> way you
>>> > implement rules that you would like to enforce before the physical
>>> plan is
>>> > created)?
>>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
>>> > present, if not then add it in the query (probably transform it).*
>>> > *research done so far* : I probably need to create a RelRule and
>>> implement
>>> > the "matches" and "onMatch" methods. The "onMatch" method should
>>> transform
>>> > my rel or rex node using transformTo, probably.
>>> > Something like
>>> >
>>> >
>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
>>> > is
>>> > something which I might need to implement.
>>> > But a few methods are hard to grasp for me and they are:
>>> >
>>> >  - transformTo
>>> >  - RelRule.Config.withOperandFor()
>>> >
>>> > Could you please give me some insight on how I can understand their
>>> > structures?
>>> >
>>> > - is it possible to find whether a node (sql/rel) is GROUP BY or not?
>>> > - if there is an AND clause in WHERE, is there a way to parse all the
>>> > filter conditions separately? (using RexCall.getOperator did not work)
>>> > - is this (what I have done) only a hack or is there a better way to
>>> > implement this?
>>> >
>>> > I understand that my questions are probably not trivial, so I
>>> appreciate
>>> > that you are reading my emails and providing your valuable feedback.
>>> >
>>> > Thank you for your time and consideration. :)
>>> >
>>> > With regards,
>>> > Soumyadeep Mukhopadhyay.
>>> >
>>> >
>>> >
>>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <as...@gmail.com>
>>> > wrote:
>>> >
>>> > > Thank you for your kind words :)
>>> > >
>>> > > - Askar
>>> > >
>>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
>>> > soumyamyb95@gmail.com
>>> > > >
>>> > > wrote:
>>> > >
>>> > > > Hello Askar,
>>> > > >
>>> > > > Thank you so much for taking the time to compile this for me.
>>> > > >
>>> > > > I shall be indebted to you for this effort. Please let me know if
>>> I can
>>> > > be
>>> > > > of any assistance ever to repay your kindness.
>>> > > >
>>> > > > I appreciate all your help and guidance. Thank you once again. I
>>> shall
>>> > go
>>> > > > through the same and will get back to you as soon as possible.
>>> Thank
>>> > you
>>> > > > again sir! :)
>>> > > >
>>> > > > With best regards,
>>> > > > Soumyadeep Mukhopadhyay.
>>> > > >
>>> > > >
>>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
>>> askar.mulin@gmail.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Hello again!
>>> > > > > I have prepared this gist to perhaps help you understand how to
>>> > > traverse
>>> > > > > through a Sql parse tree:
>>> > > > >
>>> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
>>> > > > >
>>> > > > > I have implemented a Filterer class that returns false on any
>>> queries
>>> > > > which
>>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
>>> > > > queries.
>>> > > > > I have used Kotlin, so if you have difficulties understanding it
>>> (as
>>> > > > you're
>>> > > > > using Scala), check out Kotlin's smart cast and nullability.
>>> > > > >
>>> > > > > Good luck and hopefully that was helpful,
>>> > > > > Askar Bozcan
>>> > > > >
>>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
>>> > > > > soumyamyb95@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi Askar,
>>> > > > > >
>>> > > > > > I have gone through your email a couple of times. I am trying
>>> to
>>> > > > > > understand it bit by bit.
>>> > > > > >
>>> > > > > > I have a request, please feel free to say no, is there any code
>>> > base
>>> > > > > where
>>> > > > > > I can refer and understand how to implement my own filter rule?
>>> > > > > > What I realised I asked wrong is I wanted to implement "(HAVING
>>> > > > COUNT(*)
>>> > > > > > > 100)" as a global rule, if there are any aggregation queries.
>>> > > > > >
>>> > > > > > It is a bit difficult for me to understand where I should
>>> start.
>>> > > > > > For example, if I follow this
>>> > > > > https://github.com/zabetak/calcite-tutorial where
>>> > > > > > shall I start looking at?
>>> > > > > > In the below image I feel like some kind of comparison is
>>> going on,
>>> > > > > should
>>> > > > > > I devise my code like this? (in order to implement something
>>> like
>>> > > > > "(HAVING
>>> > > > > > COUNT(*) > 100)")
>>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
>>> > > > > >
>>> > > > > > Also is there any documentation I can go through regarding how
>>> I
>>> > can
>>> > > > > > traverse through the AST?
>>> > > > > > Probably I am not looking in the right places but so far I
>>> could
>>> > only
>>> > > > go
>>> > > > > > through Tab9 code examples (or the documentation provided by
>>> the
>>> > > > Calcite
>>> > > > > > website) and things did not seem to be clear.
>>> > > > > > I realise it is probably a lot to ask, so whatever you share
>>> will
>>> > be
>>> > > a
>>> > > > > lot
>>> > > > > > of help for me.
>>> > > > > >
>>> > > > > > Thanks again for your time, patience and help!
>>> > > > > >
>>> > > > > > With regards,
>>> > > > > > Soumyadeep Mukhopadhyay.
>>> > > > > >
>>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
>>> askar.mulin@gmail.com
>>> > >
>>> > > > > wrote:
>>> > > > > >
>>> > > > > >> Hey,
>>> > > > > >> You can use Planner.reset(). Note that it has mutable state
>>> > inside,
>>> > > so
>>> > > > > do
>>> > > > > >> not reuse the same planner instance in any multi-threading
>>> > > > environment.
>>> > > > > >>
>>> > > > > >> (I am assuming you have access to table metadata so that you
>>> will
>>> > be
>>> > > > > able
>>> > > > > >> to convert your SqlNode tree into RelNode tree, relational
>>> > > expression
>>> > > > > >> tree)
>>> > > > > >> - Only return results above a certain threshold when using
>>> GROUP
>>> > BY,
>>> > > > for
>>> > > > > >> example (HAVING COUNT(col1) > 100).
>>> > > > > >>
>>> > > > > >> I'm not quite sure I understand this question is HAVING part
>>> of
>>> > the
>>> > > > > query?
>>> > > > > >>
>>> > > > > >> - Restrict the column on which joins can happen, or else
>>> throw an
>>> > > > error
>>> > > > > >> (almost like analysisException in Spark)
>>> > > > > >>
>>> > > > > >> Do you have access to table/schema metadata?
>>> > > > > >> If you do:
>>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
>>> > > relational
>>> > > > > >> tree
>>> > > > > >> (RelNode).
>>> > > > > >> Watch this tutorial by Stamatis:
>>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
>>> > > > > >> It will explain way better the usage than I can in an email.
>>> (And
>>> > it
>>> > > > > what
>>> > > > > >> introduced me to Calcite's basics :))
>>> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
>>> > > > > >> <
>>> > > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
>>> > > > > >> >.
>>> > > > > >> (Look up Visitor pattern to better understand how this
>>> recursive
>>> > > > > traversal
>>> > > > > >> works)
>>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
>>> > > > > >> 4) When you encounter LogicalJoin, traverse its children with
>>> > > getLeft
>>> > > > > and
>>> > > > > >> getRight
>>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
>>> > RelOptTable
>>> > > > > with
>>> > > > > >> getTable, and then RelOptTable.getRowType() to find the
>>> fields.
>>> > > > > >> (Not 100% about step 5, but should be close to it)
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> - Restrict the columns that can be passed in a select
>>> statement or
>>> > > > else
>>> > > > > >> throw an error (like in the point above)
>>> > > > > >> Same logic as above. But instead of TableScan, look at the
>>> root
>>> > > > > >> LogicalProject.
>>> > > > > >>
>>> > > > > >> All of the above should be doable with just the parse tree
>>> > (SqlNode)
>>> > > > and
>>> > > > > >> without access to table metadata using SqlShuttle; however,
>>> it's
>>> > > > easier
>>> > > > > >> and
>>> > > > > >> less error-prone with relational tree IMO.
>>> > > > > >>
>>> > > > > >> - Askar
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
>>> > > > > >> soumyamyb95@gmail.com>
>>> > > > > >> wrote:
>>> > > > > >>
>>> > > > > >> > Hey Askar,
>>> > > > > >> >
>>> > > > > >> > It worked exactly how you suggested.
>>> > > > > >> >
>>> > > > > >> > ```
>>> > > > > >> >
>>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
>>> > > > table_name
>>> > > > > >> > WHERE predicate_column = 'predicate_value'"
>>> > > > > >> > private val frameworkConfig: FrameworkConfig =
>>> > > > > >> > Frameworks.newConfigBuilder.build()
>>> > > > > >> > private val planner: Planner =
>>> > > > Frameworks.getPlanner(frameworkConfig)
>>> > > > > >> > private val planner2: Planner =
>>> > > > Frameworks.getPlanner(frameworkConfig)
>>> > > > > >> > private val planner3: Planner =
>>> > > > Frameworks.getPlanner(frameworkConfig)
>>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
>>> > > > > >> > println(sqlNode.getKind)
>>> > > > > >> > private val sqlSelectStmt: SqlSelect =
>>> > > > sqlNode.asInstanceOf[SqlSelect]
>>> > > > > >> >
>>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
>>> > > > > >> > private val selectList =
>>> > > > > >> >
>>> > > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
>>> > > > > >> > private val setFromTableQuery = "SELECT employee"
>>> > > > > >> > private val fromTable =
>>> > > > > >> >
>>> > > > >
>>> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
>>> > > > > >> >
>>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
>>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
>>> > > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
>>> > > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
>>> > > > > >> >
>>> > > > > >> > ```
>>> > > > > >> >
>>> > > > > >> > The only caveat I see is the need for a new planner for
>>> every
>>> > new
>>> > > > > query.
>>> > > > > >> > Should I do something else or is this expected?
>>> > > > > >> >
>>> > > > > >> > On a different note, I wanted to ask about how I can write
>>> my
>>> > own
>>> > > > > >> rules. Is
>>> > > > > >> > it possible to enforce following rules:
>>> > > > > >> > - Only return results above a certain threshold when using
>>> GROUP
>>> > > BY,
>>> > > > > for
>>> > > > > >> > example (HAVING COUNT(col1) > 100).
>>> > > > > >> > - Restrict the column on which joins can happen, or else
>>> throw
>>> > an
>>> > > > > error
>>> > > > > >> > (almost like analysisException in Spark)
>>> > > > > >> > - Restrict the columns that can be passed in a select
>>> statement
>>> > or
>>> > > > > else
>>> > > > > >> > throw an error (like in the point above)
>>> > > > > >> >
>>> > > > > >> > Not sure if it is feasible, please feel free to suggest. :)
>>> > Thanks
>>> > > > > again
>>> > > > > >> > for your time!
>>> > > > > >> >
>>> > > > > >> > With regards,
>>> > > > > >> > Soumyadeep Mukhopadhyay.
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
>>> > askar.mulin@gmail.com
>>> > > >
>>> > > > > >> wrote:
>>> > > > > >> >
>>> > > > > >> > > Hey Soumyadeep,
>>> > > > > >> > > I think that can work with a few caveats.
>>> > > > > >> > > 0) Use the Planner from Frameworks
>>> > > > > >> > > <
>>> > > > > >> > >
>>> > > > > >> >
>>> > > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
>>> > > > > >> > > >
>>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root
>>> > > SqlNode)
>>> > > > > but
>>> > > > > >> do
>>> > > > > >> > > not validate it
>>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
>>> > > > > >> > > <
>>> > > > > >> > >
>>> > > > > >> >
>>> > > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
>>> > > > > >> > > >
>>> > > > > >> > > (can
>>> > > > > >> > > verify that it's a SELECT by using getKind())
>>> > > > > >> > > 3) Use the setters from SqlSelect to modify the children
>>> of
>>> > root
>>> > > > > >> > SqlSelect
>>> > > > > >> > > node.
>>> > > > > >> > > 4) Unparse the root query back into a string
>>> > > (SqlNode.unparse()).
>>> > > > > >> > >
>>> > > > > >> > > Now the tricky part is 3), as you have to set proper
>>> SqlNode
>>> > > types
>>> > > > > as
>>> > > > > >> > > children of the SELECT node for column names, table names
>>> and
>>> > > for
>>> > > > > >> WHERE
>>> > > > > >> > > predicate. I don't remember them properly, but what you
>>> can do
>>> > > is
>>> > > > > >> parse a
>>> > > > > >> > > proper query (again, don't validate it as you'll then need
>>> > table
>>> > > > > >> > metadata),
>>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and
>>> > > replace
>>> > > > > them
>>> > > > > >> > with
>>> > > > > >> > > your replacements using setters of root SqlSelect node.
>>> > > > > >> > >
>>> > > > > >> > > This should work, but I'm not 100% certain as I'm unable
>>> to
>>> > > check
>>> > > > > >> right
>>> > > > > >> > > now.
>>> > > > > >> > >
>>> > > > > >> > > PS: Take care about dialects. Dialects are used in
>>> unparsing
>>> > as
>>> > > a
>>> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax
>>> tree
>>> > > > > >> differently
>>> > > > > >> > > based on the dialect you have chosen.
>>> > > > > >> > >
>>> > > > > >> > > Kind regards,
>>> > > > > >> > > Askar Bozcan
>>> > > > > >> > >
>>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
>>> > > > > >> soumyamyb95@gmail.com>
>>> > > > > >> > > wrote:
>>> > > > > >> > >
>>> > > > > >> > > Hello All,
>>> > > > > >> > >
>>> > > > > >> > > I have just heard of Apache Calcite and was exploring the
>>> > > > > >> possibilities.
>>> > > > > >> > I
>>> > > > > >> > > wish to achieve the following, and wanted to check if my
>>> hunch
>>> > > is
>>> > > > > >> > correct:
>>> > > > > >> > > - Use a template to build SQL queries, like use jinja-sql
>>> or
>>> > > even
>>> > > > > >> pebble
>>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
>>> template
>>> > > and
>>> > > > > >> then
>>> > > > > >> > add
>>> > > > > >> > > the necessary fields like table name and group by fields
>>> from
>>> > an
>>> > > > > >> input)
>>> > > > > >> > >
>>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would
>>> be
>>> > > > inside
>>> > > > > a
>>> > > > > >> > query
>>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
>>> > {{select_fields
>>> > > |
>>> > > > > >> > sqlsafe}}
>>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
>>> > sqlsafe}}"
>>> > > > and
>>> > > > > >> > values
>>> > > > > >> > > like 'select_fields' would be substituted at run-time
>>> from a
>>> > > > > >> processing
>>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for
>>> Java)
>>> > and
>>> > > > the
>>> > > > > >> > output
>>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL"
>>> with
>>> > > > some
>>> > > > > >> > dialect
>>> > > > > >> > > (like Snowflake or Big-Query).
>>> > > > > >> > >
>>> > > > > >> > > Is this possible? Any recommendations or suggestions are
>>> > > welcome.
>>> > > > > >> Even if
>>> > > > > >> > > the approach feels wrong please let me know. :)
>>> > > > > >> > > Thank you for your time and consideration.
>>> > > > > >> > >
>>> > > > > >> > > With regards,
>>> > > > > >> > > Soumyadeep Mukhopadhyay.
>>> > > > > >> > >
>>> > > > > >> >
>>> > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Template SQL

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.

Hey Askar,

I am happy to inform that I could make a little more progress, and here is
the gist -
https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67.

Although I am able to do this, I don't think I have reached my goal yet.
A question that came to mind was this :
- Is it not possible to add a node to a tree when the sql query has been
parsed to RelNode after validation?
- If I wish to apply a similar approach to a RelRule which I am trying to
enforce, is that not expected? (an anti-pattern)
- Is there a way to generate the RelBuilder version of the same Sql query?
(SqlToRelConverter?)

My approach in this gist :
get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
find SqlSelect (since there's getHaving and setHaving) ==> create a new
dummy query and extract having from the query ==>
set Having for the current query at hand via the dummy query.
Note : For no particular reason I have used only visit by SqlCall, was
trying to build up on what you shared and the other visit methods didn't
seem to fit.

The approach I have in mind for a RelRule :
use ConverterRule ==> find the node under onMatch as Aggregate ==>
then use call.transformTo to add a having clause to the existing aggregate
node if there's no having clause present using the RelBuilder, but that
does not seem feasible.
(below is what I was thinking, original query : SELECT o_custkey, COUNT(*) AS
C FROM orders GROUP BY o_custkey)
[image: Screenshot 2023-05-10 at 2.09.41 AM.png]

Maybe I am missing something. Please feel free to share your comments. :)
Thank you again!

With regards,
Soumyadeep Mukhopadhyay.

On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
soumyamyb95@gmail.com> wrote:

> Hey Askar,
>
> Thank you so much for your email. Please take as much time as you need. I
> will keep trying in the meantime. :)
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com> wrote:
>
>> Hey, sorry for the late reply but I had some stuff to do IRL & some
>> deadlines I need to meet right now. I'll try to answer you as soon as I
>> can, but this week doesn't seem possible.
>>
>> Keep the questions coming, however, if you haven't solved them yourself
>> (and if you did, please share your solutions). I'm planning to write a
>> sort-of "Comprehensive Calcite 101" documentation in the near future which
>> I am going to be basing on my own experiences & questions in the
>> community.
>>
>> Kind regards,
>> Askar
>>
>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com>
>> wrote:
>>
>> > Hello Askar,
>> >
>> > Thank you so much for guiding me, I have taken some inspiration from
>> your
>> > code and from Dremio's implementation and compiled this -
>> >
>> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
>> > .
>> >
>> > I have a few follow-up questions, if I may, regarding what I have
>> > implemented:
>> >
>> > - is it possible to implement this as a rule, probably a RelRule (I am
>> > saying a RelRule because as far as I have understood RelRule is the way
>> you
>> > implement rules that you would like to enforce before the physical plan
>> is
>> > created)?
>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
>> > present, if not then add it in the query (probably transform it).*
>> > *research done so far* : I probably need to create a RelRule and
>> implement
>> > the "matches" and "onMatch" methods. The "onMatch" method should
>> transform
>> > my rel or rex node using transformTo, probably.
>> > Something like
>> >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
>> > is
>> > something which I might need to implement.
>> > But a few methods are hard to grasp for me and they are:
>> >
>> >  - transformTo
>> >  - RelRule.Config.withOperandFor()
>> >
>> > Could you please give me some insight on how I can understand their
>> > structures?
>> >
>> > - is it possible to find whether a node (sql/rel) is GROUP BY or not?
>> > - if there is an AND clause in WHERE, is there a way to parse all the
>> > filter conditions separately? (using RexCall.getOperator did not work)
>> > - is this (what I have done) only a hack or is there a better way to
>> > implement this?
>> >
>> > I understand that my questions are probably not trivial, so I appreciate
>> > that you are reading my emails and providing your valuable feedback.
>> >
>> > Thank you for your time and consideration. :)
>> >
>> > With regards,
>> > Soumyadeep Mukhopadhyay.
>> >
>> >
>> >
>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <as...@gmail.com>
>> > wrote:
>> >
>> > > Thank you for your kind words :)
>> > >
>> > > - Askar
>> > >
>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
>> > soumyamyb95@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > Hello Askar,
>> > > >
>> > > > Thank you so much for taking the time to compile this for me.
>> > > >
>> > > > I shall be indebted to you for this effort. Please let me know if I
>> can
>> > > be
>> > > > of any assistance ever to repay your kindness.
>> > > >
>> > > > I appreciate all your help and guidance. Thank you once again. I
>> shall
>> > go
>> > > > through the same and will get back to you as soon as possible. Thank
>> > you
>> > > > again sir! :)
>> > > >
>> > > > With best regards,
>> > > > Soumyadeep Mukhopadhyay.
>> > > >
>> > > >
>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
>> askar.mulin@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hello again!
>> > > > > I have prepared this gist to perhaps help you understand how to
>> > > traverse
>> > > > > through a Sql parse tree:
>> > > > >
>> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
>> > > > >
>> > > > > I have implemented a Filterer class that returns false on any
>> queries
>> > > > which
>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
>> > > > queries.
>> > > > > I have used Kotlin, so if you have difficulties understanding it
>> (as
>> > > > you're
>> > > > > using Scala), check out Kotlin's smart cast and nullability.
>> > > > >
>> > > > > Good luck and hopefully that was helpful,
>> > > > > Askar Bozcan
>> > > > >
>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
>> > > > > soumyamyb95@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Askar,
>> > > > > >
>> > > > > > I have gone through your email a couple of times. I am trying to
>> > > > > > understand it bit by bit.
>> > > > > >
>> > > > > > I have a request, please feel free to say no, is there any code
>> > base
>> > > > > where
>> > > > > > I can refer and understand how to implement my own filter rule?
>> > > > > > What I realised I asked wrong is I wanted to implement "(HAVING
>> > > > COUNT(*)
>> > > > > > > 100)" as a global rule, if there are any aggregation queries.
>> > > > > >
>> > > > > > It is a bit difficult for me to understand where I should start.
>> > > > > > For example, if I follow this
>> > > > > https://github.com/zabetak/calcite-tutorial where
>> > > > > > shall I start looking at?
>> > > > > > In the below image I feel like some kind of comparison is going
>> on,
>> > > > > should
>> > > > > > I devise my code like this? (in order to implement something
>> like
>> > > > > "(HAVING
>> > > > > > COUNT(*) > 100)")
>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
>> > > > > >
>> > > > > > Also is there any documentation I can go through regarding how I
>> > can
>> > > > > > traverse through the AST?
>> > > > > > Probably I am not looking in the right places but so far I could
>> > only
>> > > > go
>> > > > > > through Tab9 code examples (or the documentation provided by the
>> > > > Calcite
>> > > > > > website) and things did not seem to be clear.
>> > > > > > I realise it is probably a lot to ask, so whatever you share
>> will
>> > be
>> > > a
>> > > > > lot
>> > > > > > of help for me.
>> > > > > >
>> > > > > > Thanks again for your time, patience and help!
>> > > > > >
>> > > > > > With regards,
>> > > > > > Soumyadeep Mukhopadhyay.
>> > > > > >
>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
>> askar.mulin@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> Hey,
>> > > > > >> You can use Planner.reset(). Note that it has mutable state
>> > inside,
>> > > so
>> > > > > do
>> > > > > >> not reuse the same planner instance in any multi-threading
>> > > > environment.
>> > > > > >>
>> > > > > >> (I am assuming you have access to table metadata so that you
>> will
>> > be
>> > > > > able
>> > > > > >> to convert your SqlNode tree into RelNode tree, relational
>> > > expression
>> > > > > >> tree)
>> > > > > >> - Only return results above a certain threshold when using
>> GROUP
>> > BY,
>> > > > for
>> > > > > >> example (HAVING COUNT(col1) > 100).
>> > > > > >>
>> > > > > >> I'm not quite sure I understand this question is HAVING part of
>> > the
>> > > > > query?
>> > > > > >>
>> > > > > >> - Restrict the column on which joins can happen, or else throw
>> an
>> > > > error
>> > > > > >> (almost like analysisException in Spark)
>> > > > > >>
>> > > > > >> Do you have access to table/schema metadata?
>> > > > > >> If you do:
>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
>> > > relational
>> > > > > >> tree
>> > > > > >> (RelNode).
>> > > > > >> Watch this tutorial by Stamatis:
>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
>> > > > > >> It will explain way better the usage than I can in an email.
>> (And
>> > it
>> > > > > what
>> > > > > >> introduced me to Calcite's basics :))
>> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
>> > > > > >> <
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
>> > > > > >> >.
>> > > > > >> (Look up Visitor pattern to better understand how this
>> recursive
>> > > > > traversal
>> > > > > >> works)
>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
>> > > > > >> 4) When you encounter LogicalJoin, traverse its children with
>> > > getLeft
>> > > > > and
>> > > > > >> getRight
>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
>> > RelOptTable
>> > > > > with
>> > > > > >> getTable, and then RelOptTable.getRowType() to find the fields.
>> > > > > >> (Not 100% about step 5, but should be close to it)
>> > > > > >>
>> > > > > >>
>> > > > > >> - Restrict the columns that can be passed in a select
>> statement or
>> > > > else
>> > > > > >> throw an error (like in the point above)
>> > > > > >> Same logic as above. But instead of TableScan, look at the root
>> > > > > >> LogicalProject.
>> > > > > >>
>> > > > > >> All of the above should be doable with just the parse tree
>> > (SqlNode)
>> > > > and
>> > > > > >> without access to table metadata using SqlShuttle; however,
>> it's
>> > > > easier
>> > > > > >> and
>> > > > > >> less error-prone with relational tree IMO.
>> > > > > >>
>> > > > > >> - Askar
>> > > > > >>
>> > > > > >>
>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
>> > > > > >> soumyamyb95@gmail.com>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >> > Hey Askar,
>> > > > > >> >
>> > > > > >> > It worked exactly how you suggested.
>> > > > > >> >
>> > > > > >> > ```
>> > > > > >> >
>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
>> > > > table_name
>> > > > > >> > WHERE predicate_column = 'predicate_value'"
>> > > > > >> > private val frameworkConfig: FrameworkConfig =
>> > > > > >> > Frameworks.newConfigBuilder.build()
>> > > > > >> > private val planner: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val planner2: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val planner3: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
>> > > > > >> > println(sqlNode.getKind)
>> > > > > >> > private val sqlSelectStmt: SqlSelect =
>> > > > sqlNode.asInstanceOf[SqlSelect]
>> > > > > >> >
>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
>> > > > > >> > private val selectList =
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
>> > > > > >> > private val setFromTableQuery = "SELECT employee"
>> > > > > >> > private val fromTable =
>> > > > > >> >
>> > > > >
>> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
>> > > > > >> >
>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
>> > > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
>> > > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
>> > > > > >> >
>> > > > > >> > ```
>> > > > > >> >
>> > > > > >> > The only caveat I see is the need for a new planner for every
>> > new
>> > > > > query.
>> > > > > >> > Should I do something else or is this expected?
>> > > > > >> >
>> > > > > >> > On a different note, I wanted to ask about how I can write my
>> > own
>> > > > > >> rules. Is
>> > > > > >> > it possible to enforce following rules:
>> > > > > >> > - Only return results above a certain threshold when using
>> GROUP
>> > > BY,
>> > > > > for
>> > > > > >> > example (HAVING COUNT(col1) > 100).
>> > > > > >> > - Restrict the column on which joins can happen, or else
>> throw
>> > an
>> > > > > error
>> > > > > >> > (almost like analysisException in Spark)
>> > > > > >> > - Restrict the columns that can be passed in a select
>> statement
>> > or
>> > > > > else
>> > > > > >> > throw an error (like in the point above)
>> > > > > >> >
>> > > > > >> > Not sure if it is feasible, please feel free to suggest. :)
>> > Thanks
>> > > > > again
>> > > > > >> > for your time!
>> > > > > >> >
>> > > > > >> > With regards,
>> > > > > >> > Soumyadeep Mukhopadhyay.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
>> > askar.mulin@gmail.com
>> > > >
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Hey Soumyadeep,
>> > > > > >> > > I think that can work with a few caveats.
>> > > > > >> > > 0) Use the Planner from Frameworks
>> > > > > >> > > <
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
>> > > > > >> > > >
>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root
>> > > SqlNode)
>> > > > > but
>> > > > > >> do
>> > > > > >> > > not validate it
>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
>> > > > > >> > > <
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
>> > > > > >> > > >
>> > > > > >> > > (can
>> > > > > >> > > verify that it's a SELECT by using getKind())
>> > > > > >> > > 3) Use the setters from SqlSelect to modify the children of
>> > root
>> > > > > >> > SqlSelect
>> > > > > >> > > node.
>> > > > > >> > > 4) Unparse the root query back into a string
>> > > (SqlNode.unparse()).
>> > > > > >> > >
>> > > > > >> > > Now the tricky part is 3), as you have to set proper
>> SqlNode
>> > > types
>> > > > > as
>> > > > > >> > > children of the SELECT node for column names, table names
>> and
>> > > for
>> > > > > >> WHERE
>> > > > > >> > > predicate. I don't remember them properly, but what you
>> can do
>> > > is
>> > > > > >> parse a
>> > > > > >> > > proper query (again, don't validate it as you'll then need
>> > table
>> > > > > >> > metadata),
>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and
>> > > replace
>> > > > > them
>> > > > > >> > with
>> > > > > >> > > your replacements using setters of root SqlSelect node.
>> > > > > >> > >
>> > > > > >> > > This should work, but I'm not 100% certain as I'm unable to
>> > > check
>> > > > > >> right
>> > > > > >> > > now.
>> > > > > >> > >
>> > > > > >> > > PS: Take care about dialects. Dialects are used in
>> unparsing
>> > as
>> > > a
>> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax
>> tree
>> > > > > >> differently
>> > > > > >> > > based on the dialect you have chosen.
>> > > > > >> > >
>> > > > > >> > > Kind regards,
>> > > > > >> > > Askar Bozcan
>> > > > > >> > >
>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
>> > > > > >> soumyamyb95@gmail.com>
>> > > > > >> > > wrote:
>> > > > > >> > >
>> > > > > >> > > Hello All,
>> > > > > >> > >
>> > > > > >> > > I have just heard of Apache Calcite and was exploring the
>> > > > > >> possibilities.
>> > > > > >> > I
>> > > > > >> > > wish to achieve the following, and wanted to check if my
>> hunch
>> > > is
>> > > > > >> > correct:
>> > > > > >> > > - Use a template to build SQL queries, like use jinja-sql
>> or
>> > > even
>> > > > > >> pebble
>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
>> template
>> > > and
>> > > > > >> then
>> > > > > >> > add
>> > > > > >> > > the necessary fields like table name and group by fields
>> from
>> > an
>> > > > > >> input)
>> > > > > >> > >
>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would
>> be
>> > > > inside
>> > > > > a
>> > > > > >> > query
>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
>> > {{select_fields
>> > > |
>> > > > > >> > sqlsafe}}
>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
>> > sqlsafe}}"
>> > > > and
>> > > > > >> > values
>> > > > > >> > > like 'select_fields' would be substituted at run-time from
>> a
>> > > > > >> processing
>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for
>> Java)
>> > and
>> > > > the
>> > > > > >> > output
>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL"
>> with
>> > > > some
>> > > > > >> > dialect
>> > > > > >> > > (like Snowflake or Big-Query).
>> > > > > >> > >
>> > > > > >> > > Is this possible? Any recommendations or suggestions are
>> > > welcome.
>> > > > > >> Even if
>> > > > > >> > > the approach feels wrong please let me know. :)
>> > > > > >> > > Thank you for your time and consideration.
>> > > > > >> > >
>> > > > > >> > > With regards,
>> > > > > >> > > Soumyadeep Mukhopadhyay.
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Template SQL

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.

Hey Askar,

Thank you so much for your email. Please take as much time as you need. I
will keep trying in the meantime. :)

With regards,
Soumyadeep Mukhopadhyay.

On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <as...@gmail.com> wrote:

> Hey, sorry for the late reply but I had some stuff to do IRL & some
> deadlines I need to meet right now. I'll try to answer you as soon as I
> can, but this week doesn't seem possible.
>
> Keep the questions coming, however, if you haven't solved them yourself
> (and if you did, please share your solutions). I'm planning to write a
> sort-of "Comprehensive Calcite 101" documentation in the near future which
> I am going to be basing on my own experiences & questions in the community.
>
> Kind regards,
> Askar
>
> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com>
> wrote:
>
> > Hello Askar,
> >
> > Thank you so much for guiding me, I have taken some inspiration from your
> > code and from Dremio's implementation and compiled this -
> >
> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
> > .
> >
> > I have a few follow-up questions, if I may, regarding what I have
> > implemented:
> >
> > - is it possible to implement this as a rule, probably a RelRule (I am
> > saying a RelRule because as far as I have understood RelRule is the way
> you
> > implement rules that you would like to enforce before the physical plan
> is
> > created)?
> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
> > present, if not then add it in the query (probably transform it).*
> > *research done so far* : I probably need to create a RelRule and
> implement
> > the "matches" and "onMatch" methods. The "onMatch" method should
> transform
> > my rel or rex node using transformTo, probably.
> > Something like
> >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
> > is
> > something which I might need to implement.
> > But a few methods are hard to grasp for me and they are:
> >
> >  - transformTo
> >  - RelRule.Config.withOperandFor()
> >
> > Could you please give me some insight on how I can understand their
> > structures?
> >
> > - is it possible to find whether a node (sql/rel) is GROUP BY or not?
> > - if there is an AND clause in WHERE, is there a way to parse all the
> > filter conditions separately? (using RexCall.getOperator did not work)
> > - is this (what I have done) only a hack or is there a better way to
> > implement this?
> >
> > I understand that my questions are probably not trivial, so I appreciate
> > that you are reading my emails and providing your valuable feedback.
> >
> > Thank you for your time and consideration. :)
> >
> > With regards,
> > Soumyadeep Mukhopadhyay.
> >
> >
> >
> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <as...@gmail.com>
> > wrote:
> >
> > > Thank you for your kind words :)
> > >
> > > - Askar
> > >
> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
> > soumyamyb95@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello Askar,
> > > >
> > > > Thank you so much for taking the time to compile this for me.
> > > >
> > > > I shall be indebted to you for this effort. Please let me know if I
> can
> > > be
> > > > of any assistance ever to repay your kindness.
> > > >
> > > > I appreciate all your help and guidance. Thank you once again. I
> shall
> > go
> > > > through the same and will get back to you as soon as possible. Thank
> > you
> > > > again sir! :)
> > > >
> > > > With best regards,
> > > > Soumyadeep Mukhopadhyay.
> > > >
> > > >
> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <askar.mulin@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hello again!
> > > > > I have prepared this gist to perhaps help you understand how to
> > > traverse
> > > > > through a Sql parse tree:
> > > > >
> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> > > > >
> > > > > I have implemented a Filterer class that returns false on any
> queries
> > > > which
> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
> > > > queries.
> > > > > I have used Kotlin, so if you have difficulties understanding it
> (as
> > > > you're
> > > > > using Scala), check out Kotlin's smart cast and nullability.
> > > > >
> > > > > Good luck and hopefully that was helpful,
> > > > > Askar Bozcan
> > > > >
> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> > > > > soumyamyb95@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Askar,
> > > > > >
> > > > > > I have gone through your email a couple of times. I am trying to
> > > > > > understand it bit by bit.
> > > > > >
> > > > > > I have a request, please feel free to say no, is there any code
> > base
> > > > > where
> > > > > > I can refer and understand how to implement my own filter rule?
> > > > > > What I realised I asked wrong is I wanted to implement "(HAVING
> > > > COUNT(*)
> > > > > > > 100)" as a global rule, if there are any aggregation queries.
> > > > > >
> > > > > > It is a bit difficult for me to understand where I should start.
> > > > > > For example, if I follow this
> > > > > https://github.com/zabetak/calcite-tutorial where
> > > > > > shall I start looking at?
> > > > > > In the below image I feel like some kind of comparison is going
> on,
> > > > > should
> > > > > > I devise my code like this? (in order to implement something like
> > > > > "(HAVING
> > > > > > COUNT(*) > 100)")
> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> > > > > >
> > > > > > Also is there any documentation I can go through regarding how I
> > can
> > > > > > traverse through the AST?
> > > > > > Probably I am not looking in the right places but so far I could
> > only
> > > > go
> > > > > > through Tab9 code examples (or the documentation provided by the
> > > > Calcite
> > > > > > website) and things did not seem to be clear.
> > > > > > I realise it is probably a lot to ask, so whatever you share will
> > be
> > > a
> > > > > lot
> > > > > > of help for me.
> > > > > >
> > > > > > Thanks again for your time, patience and help!
> > > > > >
> > > > > > With regards,
> > > > > > Soumyadeep Mukhopadhyay.
> > > > > >
> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
> askar.mulin@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Hey,
> > > > > >> You can use Planner.reset(). Note that it has mutable state
> > inside,
> > > so
> > > > > do
> > > > > >> not reuse the same planner instance in any multi-threading
> > > > environment.
> > > > > >>
> > > > > >> (I am assuming you have access to table metadata so that you
> will
> > be
> > > > > able
> > > > > >> to convert your SqlNode tree into RelNode tree, relational
> > > expression
> > > > > >> tree)
> > > > > >> - Only return results above a certain threshold when using GROUP
> > BY,
> > > > for
> > > > > >> example (HAVING COUNT(col1) > 100).
> > > > > >>
> > > > > >> I'm not quite sure I understand this question is HAVING part of
> > the
> > > > > query?
> > > > > >>
> > > > > >> - Restrict the column on which joins can happen, or else throw
> an
> > > > error
> > > > > >> (almost like analysisException in Spark)
> > > > > >>
> > > > > >> Do you have access to table/schema metadata?
> > > > > >> If you do:
> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
> > > relational
> > > > > >> tree
> > > > > >> (RelNode).
> > > > > >> Watch this tutorial by Stamatis:
> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> > > > > >> It will explain way better the usage than I can in an email.
> (And
> > it
> > > > > what
> > > > > >> introduced me to Calcite's basics :))
> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> > > > > >> >.
> > > > > >> (Look up Visitor pattern to better understand how this recursive
> > > > > traversal
> > > > > >> works)
> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> > > > > >> 4) When you encounter LogicalJoin, traverse its children with
> > > getLeft
> > > > > and
> > > > > >> getRight
> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
> > RelOptTable
> > > > > with
> > > > > >> getTable, and then RelOptTable.getRowType() to find the fields.
> > > > > >> (Not 100% about step 5, but should be close to it)
> > > > > >>
> > > > > >>
> > > > > >> - Restrict the columns that can be passed in a select statement
> or
> > > > else
> > > > > >> throw an error (like in the point above)
> > > > > >> Same logic as above. But instead of TableScan, look at the root
> > > > > >> LogicalProject.
> > > > > >>
> > > > > >> All of the above should be doable with just the parse tree
> > (SqlNode)
> > > > and
> > > > > >> without access to table metadata using SqlShuttle; however, it's
> > > > easier
> > > > > >> and
> > > > > >> less error-prone with relational tree IMO.
> > > > > >>
> > > > > >> - Askar
> > > > > >>
> > > > > >>
> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> > > > > >> soumyamyb95@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hey Askar,
> > > > > >> >
> > > > > >> > It worked exactly how you suggested.
> > > > > >> >
> > > > > >> > ```
> > > > > >> >
> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
> > > > table_name
> > > > > >> > WHERE predicate_column = 'predicate_value'"
> > > > > >> > private val frameworkConfig: FrameworkConfig =
> > > > > >> > Frameworks.newConfigBuilder.build()
> > > > > >> > private val planner: Planner =
> > > > Frameworks.getPlanner(frameworkConfig)
> > > > > >> > private val planner2: Planner =
> > > > Frameworks.getPlanner(frameworkConfig)
> > > > > >> > private val planner3: Planner =
> > > > Frameworks.getPlanner(frameworkConfig)
> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> > > > > >> > println(sqlNode.getKind)
> > > > > >> > private val sqlSelectStmt: SqlSelect =
> > > > sqlNode.asInstanceOf[SqlSelect]
> > > > > >> >
> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
> > > > > >> > private val selectList =
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > > > > >> > private val setFromTableQuery = "SELECT employee"
> > > > > >> > private val fromTable =
> > > > > >> >
> > > > >
> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> > > > > >> >
> > > > > >> > sqlSelectStmt.setSelectList(selectList)
> > > > > >> > sqlSelectStmt.setFrom(fromTable)
> > > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
> > > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> > > > > >> >
> > > > > >> > ```
> > > > > >> >
> > > > > >> > The only caveat I see is the need for a new planner for every
> > new
> > > > > query.
> > > > > >> > Should I do something else or is this expected?
> > > > > >> >
> > > > > >> > On a different note, I wanted to ask about how I can write my
> > own
> > > > > >> rules. Is
> > > > > >> > it possible to enforce following rules:
> > > > > >> > - Only return results above a certain threshold when using
> GROUP
> > > BY,
> > > > > for
> > > > > >> > example (HAVING COUNT(col1) > 100).
> > > > > >> > - Restrict the column on which joins can happen, or else throw
> > an
> > > > > error
> > > > > >> > (almost like analysisException in Spark)
> > > > > >> > - Restrict the columns that can be passed in a select
> statement
> > or
> > > > > else
> > > > > >> > throw an error (like in the point above)
> > > > > >> >
> > > > > >> > Not sure if it is feasible, please feel free to suggest. :)
> > Thanks
> > > > > again
> > > > > >> > for your time!
> > > > > >> >
> > > > > >> > With regards,
> > > > > >> > Soumyadeep Mukhopadhyay.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
> > askar.mulin@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hey Soumyadeep,
> > > > > >> > > I think that can work with a few caveats.
> > > > > >> > > 0) Use the Planner from Frameworks
> > > > > >> > > <
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > > > > >> > > >
> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root
> > > SqlNode)
> > > > > but
> > > > > >> do
> > > > > >> > > not validate it
> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
> > > > > >> > > <
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > > > > >> > > >
> > > > > >> > > (can
> > > > > >> > > verify that it's a SELECT by using getKind())
> > > > > >> > > 3) Use the setters from SqlSelect to modify the children of
> > root
> > > > > >> > SqlSelect
> > > > > >> > > node.
> > > > > >> > > 4) Unparse the root query back into a string
> > > (SqlNode.unparse()).
> > > > > >> > >
> > > > > >> > > Now the tricky part is 3), as you have to set proper SqlNode
> > > types
> > > > > as
> > > > > >> > > children of the SELECT node for column names, table names
> and
> > > for
> > > > > >> WHERE
> > > > > >> > > predicate. I don't remember them properly, but what you can
> do
> > > is
> > > > > >> parse a
> > > > > >> > > proper query (again, don't validate it as you'll then need
> > table
> > > > > >> > metadata),
> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and
> > > replace
> > > > > them
> > > > > >> > with
> > > > > >> > > your replacements using setters of root SqlSelect node.
> > > > > >> > >
> > > > > >> > > This should work, but I'm not 100% certain as I'm unable to
> > > check
> > > > > >> right
> > > > > >> > > now.
> > > > > >> > >
> > > > > >> > > PS: Take care about dialects. Dialects are used in unparsing
> > as
> > > a
> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax
> tree
> > > > > >> differently
> > > > > >> > > based on the dialect you have chosen.
> > > > > >> > >
> > > > > >> > > Kind regards,
> > > > > >> > > Askar Bozcan
> > > > > >> > >
> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> > > > > >> soumyamyb95@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > Hello All,
> > > > > >> > >
> > > > > >> > > I have just heard of Apache Calcite and was exploring the
> > > > > >> possibilities.
> > > > > >> > I
> > > > > >> > > wish to achieve the following, and wanted to check if my
> hunch
> > > is
> > > > > >> > correct:
> > > > > >> > > - Use a template to build SQL queries, like use jinja-sql or
> > > even
> > > > > >> pebble
> > > > > >> > > (interpret the SqlNode tree kind of structure from my
> template
> > > and
> > > > > >> then
> > > > > >> > add
> > > > > >> > > the necessary fields like table name and group by fields
> from
> > an
> > > > > >> input)
> > > > > >> > >
> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be
> > > > inside
> > > > > a
> > > > > >> > query
> > > > > >> > > template (in Jinja-sql it may look like "SELECT
> > {{select_fields
> > > |
> > > > > >> > sqlsafe}}
> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
> > sqlsafe}}"
> > > > and
> > > > > >> > values
> > > > > >> > > like 'select_fields' would be substituted at run-time from a
> > > > > >> processing
> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for Java)
> > and
> > > > the
> > > > > >> > output
> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL"
> with
> > > > some
> > > > > >> > dialect
> > > > > >> > > (like Snowflake or Big-Query).
> > > > > >> > >
> > > > > >> > > Is this possible? Any recommendations or suggestions are
> > > welcome.
> > > > > >> Even if
> > > > > >> > > the approach feels wrong please let me know. :)
> > > > > >> > > Thank you for your time and consideration.
> > > > > >> > >
> > > > > >> > > With regards,
> > > > > >> > > Soumyadeep Mukhopadhyay.
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>