You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Sandeep N <fe...@gmail.com> on 2022/08/04 01:42:06 UTC

Summary of experience using Calcite

Hi all,

I wanted to share my experience with one of the current projects that I
tried to use Calcite with and through this experience hope to shed light on
some of the challenges I ran into and share my thoughts on how they could
be addressed.

Apologies for the long drawn explanation but I think context will help you
guys understand the challenges I hope to surface.

For the tldr version jump to the paragraph with heading "In Summary".

Let me first note that my use of Calcite  is unconventional in the sense
that I was looking to convert queries written in a homegrown DSL to SQL
that could be run on Snowflake. There were two main challenges : (a)
Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with
support for custom data-types like VARIANT.

The approach I landed on based on feedback from this forum was to take the
AST from the DSL, use a tree walker to generate a relational algebra tree
which then can generate SQL using one of the Dialects (Snowflake to be
added).

The first task involved asking questions on the dev forum and reading
through previously asked questions. Here I have to extend thanks to
Stamatis for the pointers and links he provided which helped immensely. In
the end it took those pointers + step through debugging of the code via
unit-tests to figure how to use the relational algebra tree (RelNode and
RexNode).

The second task was to add Snowflake specific constructs and here again
thanks to  Stamatis and Julian I had pointers to start looking at the
pieces. But before I went down the path of extending relational algebra or
figuring adding snowflake functions I shared my prototype work with my team
and talked through the approach and the Calcite integration. My goal was to
get a general sense of whether my team thought this was a good path forward
(some of the reason for this will be obvious in the In Summary section).

A few of the team members looked at extending my prototype with Calcite and
after a couple of weeks of working with it we as a team arrived at the
decision that we will hold off on using  Calcite and instead build
something simpler and homegrown. "Simpler" because the types of queries we
needed to translate to SQL were very limited and did not exercise all the
possible types of SQL queries. We may revisit this in the future and come
back to Calcite.

In Summary : The feedback I have is as follows
(a) Calcite does not have enough examples, docs and how-to guides so it is
difficult to get started. For this one I think our usage of Calcite being
unconventional added to the complexity as the samples might not directly
map to what we were looking for - one needs to extrapolate from what is
done from SQL to applying the same for another DSL.

(b) Learning curve for Calcite is steep. - It takes a lot of time and
patience to debug through code and understand Calcite before using it. It
takes a certain mind-set of "don't give up" and "must figure this out" to
make headway. My experience with some of the technologies (having written
micro-containers and classloaders) have left me with a higher threshold of
patience - but that is not for everyone.

I will say that the dev forums are prompt to respond to queries but at
times it might not be adequate for newbies - One suggestion I can think of
is to pair a newbie with a seasoned Calcite contributor as a mentor for
tasks which need such a pairing e.g. adding a new Database Dialect. It is a
fairly big commitment/investment on the part of the mentor but it might
make inroads into the project easier. Mentors could assign tasks of
improving docs/samples/how-to guides to the newbies as they work with them.
It would address the challenges for future adopters and also build
confidence in the newbies to contribute.

I will end on a thank you note, just the process of figuring out the
relational algebra tree and translation taught me a lot. Thanks to you guys
for all the contributions. Like I said earlier I am still hopeful we will
come back to Calcite and thus plan to stay plugged in.

Cheers,

--Sandeep Nayak

Re: Summary of experience using Calcite

Posted by Sandeep N <fe...@gmail.com>.
Thanks for taking the time to read and reply. It is encouraging to hear
that my use of Calcite is not unconventional, it does validate the thought
that we can come back to Calcite in future if necessary.

I couldn't agree more with the point you made, there is simply no other
tool available with the kind of capabilities Calcite provides.

I believe to be successful with Calcite a couple of us (from my team) will
need to make time to engage and contribute to Calcite with the long term
goal of leveraging this toolkit when the time and need arises. We are
building stuff in this space so we might as well stay aware of the tools.
Trying to look at and treat Calcite like any open source library/tool isn't
the right frame of mind.

Regards,

--Sandeep

On Thu, Aug 4, 2022 at 7:00 AM Vladimir Ozerov <pp...@gmail.com> wrote:

> Hi Sandeep,
>
> Thank you for sharing your experience. Your feedback matches with what I
> usually see in practice. IMO there are two main problems with Apache
> Calcite:
>
>    1. Documentation that focuses on a single edge case - the end-to-end
>    execution of federated queries with Enumerable. In practice, this
>    approach rarely could be used for anything but simple projects. Instead,
>    major Apache Calcite users (Apache Hive, Apache Flink, etc) do what you
>    mentioned as "unconventional use": they use separate Calcite components
> for
>    query optimization, translation, DSLs, etc. And there is virtually no
>    documentation on how to use them. So your use case is not unusual;
> there is
>    just no docs for it.
>    2. Another big problem is stability and quality. There are many bugs in
>    optimization rules, rel-to-sql conversion, etc. When doing a
> sophisticated
>    Apache Calcite integration, you should be ready to find a creative
>    workarounds for various bugs: method overrides, unusual class
> hierarchies,
>    copy-pasting of Apache Calcite codebase, etc.
>
> Unfortunately, there is no major commercial vendor behind the community
> that systematically invests in documentation and quality, so it is what it
> is.
>
> However, if you manage to get through these difficulties, you will get an
> extremely powerful tool, and I personally not aware of any other project
> that is even close to Apache Calcite in query optimization and translation
> capabilities. This is why there are more and more database and data
> management projects that choose Apache Calcite as the backbone of their
> query processing.
>
> If your use case is simple, then perhaps Apache Calcite might be too
> heavy-weight. However, if you plan to make your system more complicated, I
> would recommend you to keep Apache Calcite in mind still.
>
> Regards,
> Vladimir.
>
> чт, 4 авг. 2022 г. в 04:42, Sandeep N <fe...@gmail.com>:
>
> > Hi all,
> >
> > I wanted to share my experience with one of the current projects that I
> > tried to use Calcite with and through this experience hope to shed light
> on
> > some of the challenges I ran into and share my thoughts on how they could
> > be addressed.
> >
> > Apologies for the long drawn explanation but I think context will help
> you
> > guys understand the challenges I hope to surface.
> >
> > For the tldr version jump to the paragraph with heading "In Summary".
> >
> > Let me first note that my use of Calcite  is unconventional in the sense
> > that I was looking to convert queries written in a homegrown DSL to SQL
> > that could be run on Snowflake. There were two main challenges : (a)
> > Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with
> > support for custom data-types like VARIANT.
> >
> > The approach I landed on based on feedback from this forum was to take
> the
> > AST from the DSL, use a tree walker to generate a relational algebra tree
> > which then can generate SQL using one of the Dialects (Snowflake to be
> > added).
> >
> > The first task involved asking questions on the dev forum and reading
> > through previously asked questions. Here I have to extend thanks to
> > Stamatis for the pointers and links he provided which helped immensely.
> In
> > the end it took those pointers + step through debugging of the code via
> > unit-tests to figure how to use the relational algebra tree (RelNode and
> > RexNode).
> >
> > The second task was to add Snowflake specific constructs and here again
> > thanks to  Stamatis and Julian I had pointers to start looking at the
> > pieces. But before I went down the path of extending relational algebra
> or
> > figuring adding snowflake functions I shared my prototype work with my
> team
> > and talked through the approach and the Calcite integration. My goal was
> to
> > get a general sense of whether my team thought this was a good path
> forward
> > (some of the reason for this will be obvious in the In Summary section).
> >
> > A few of the team members looked at extending my prototype with Calcite
> and
> > after a couple of weeks of working with it we as a team arrived at the
> > decision that we will hold off on using  Calcite and instead build
> > something simpler and homegrown. "Simpler" because the types of queries
> we
> > needed to translate to SQL were very limited and did not exercise all the
> > possible types of SQL queries. We may revisit this in the future and come
> > back to Calcite.
> >
> > In Summary : The feedback I have is as follows
> > (a) Calcite does not have enough examples, docs and how-to guides so it
> is
> > difficult to get started. For this one I think our usage of Calcite being
> > unconventional added to the complexity as the samples might not directly
> > map to what we were looking for - one needs to extrapolate from what is
> > done from SQL to applying the same for another DSL.
> >
> > (b) Learning curve for Calcite is steep. - It takes a lot of time and
> > patience to debug through code and understand Calcite before using it. It
> > takes a certain mind-set of "don't give up" and "must figure this out" to
> > make headway. My experience with some of the technologies (having written
> > micro-containers and classloaders) have left me with a higher threshold
> of
> > patience - but that is not for everyone.
> >
> > I will say that the dev forums are prompt to respond to queries but at
> > times it might not be adequate for newbies - One suggestion I can think
> of
> > is to pair a newbie with a seasoned Calcite contributor as a mentor for
> > tasks which need such a pairing e.g. adding a new Database Dialect. It
> is a
> > fairly big commitment/investment on the part of the mentor but it might
> > make inroads into the project easier. Mentors could assign tasks of
> > improving docs/samples/how-to guides to the newbies as they work with
> them.
> > It would address the challenges for future adopters and also build
> > confidence in the newbies to contribute.
> >
> > I will end on a thank you note, just the process of figuring out the
> > relational algebra tree and translation taught me a lot. Thanks to you
> guys
> > for all the contributions. Like I said earlier I am still hopeful we will
> > come back to Calcite and thus plan to stay plugged in.
> >
> > Cheers,
> >
> > --Sandeep Nayak
> >
>

Re: Summary of experience using Calcite

Posted by Vladimir Ozerov <pp...@gmail.com>.
Hi Sandeep,

Thank you for sharing your experience. Your feedback matches with what I
usually see in practice. IMO there are two main problems with Apache
Calcite:

   1. Documentation that focuses on a single edge case - the end-to-end
   execution of federated queries with Enumerable. In practice, this
   approach rarely could be used for anything but simple projects. Instead,
   major Apache Calcite users (Apache Hive, Apache Flink, etc) do what you
   mentioned as "unconventional use": they use separate Calcite components for
   query optimization, translation, DSLs, etc. And there is virtually no
   documentation on how to use them. So your use case is not unusual; there is
   just no docs for it.
   2. Another big problem is stability and quality. There are many bugs in
   optimization rules, rel-to-sql conversion, etc. When doing a sophisticated
   Apache Calcite integration, you should be ready to find a creative
   workarounds for various bugs: method overrides, unusual class hierarchies,
   copy-pasting of Apache Calcite codebase, etc.

Unfortunately, there is no major commercial vendor behind the community
that systematically invests in documentation and quality, so it is what it
is.

However, if you manage to get through these difficulties, you will get an
extremely powerful tool, and I personally not aware of any other project
that is even close to Apache Calcite in query optimization and translation
capabilities. This is why there are more and more database and data
management projects that choose Apache Calcite as the backbone of their
query processing.

If your use case is simple, then perhaps Apache Calcite might be too
heavy-weight. However, if you plan to make your system more complicated, I
would recommend you to keep Apache Calcite in mind still.

Regards,
Vladimir.

чт, 4 авг. 2022 г. в 04:42, Sandeep N <fe...@gmail.com>:

> Hi all,
>
> I wanted to share my experience with one of the current projects that I
> tried to use Calcite with and through this experience hope to shed light on
> some of the challenges I ran into and share my thoughts on how they could
> be addressed.
>
> Apologies for the long drawn explanation but I think context will help you
> guys understand the challenges I hope to surface.
>
> For the tldr version jump to the paragraph with heading "In Summary".
>
> Let me first note that my use of Calcite  is unconventional in the sense
> that I was looking to convert queries written in a homegrown DSL to SQL
> that could be run on Snowflake. There were two main challenges : (a)
> Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with
> support for custom data-types like VARIANT.
>
> The approach I landed on based on feedback from this forum was to take the
> AST from the DSL, use a tree walker to generate a relational algebra tree
> which then can generate SQL using one of the Dialects (Snowflake to be
> added).
>
> The first task involved asking questions on the dev forum and reading
> through previously asked questions. Here I have to extend thanks to
> Stamatis for the pointers and links he provided which helped immensely. In
> the end it took those pointers + step through debugging of the code via
> unit-tests to figure how to use the relational algebra tree (RelNode and
> RexNode).
>
> The second task was to add Snowflake specific constructs and here again
> thanks to  Stamatis and Julian I had pointers to start looking at the
> pieces. But before I went down the path of extending relational algebra or
> figuring adding snowflake functions I shared my prototype work with my team
> and talked through the approach and the Calcite integration. My goal was to
> get a general sense of whether my team thought this was a good path forward
> (some of the reason for this will be obvious in the In Summary section).
>
> A few of the team members looked at extending my prototype with Calcite and
> after a couple of weeks of working with it we as a team arrived at the
> decision that we will hold off on using  Calcite and instead build
> something simpler and homegrown. "Simpler" because the types of queries we
> needed to translate to SQL were very limited and did not exercise all the
> possible types of SQL queries. We may revisit this in the future and come
> back to Calcite.
>
> In Summary : The feedback I have is as follows
> (a) Calcite does not have enough examples, docs and how-to guides so it is
> difficult to get started. For this one I think our usage of Calcite being
> unconventional added to the complexity as the samples might not directly
> map to what we were looking for - one needs to extrapolate from what is
> done from SQL to applying the same for another DSL.
>
> (b) Learning curve for Calcite is steep. - It takes a lot of time and
> patience to debug through code and understand Calcite before using it. It
> takes a certain mind-set of "don't give up" and "must figure this out" to
> make headway. My experience with some of the technologies (having written
> micro-containers and classloaders) have left me with a higher threshold of
> patience - but that is not for everyone.
>
> I will say that the dev forums are prompt to respond to queries but at
> times it might not be adequate for newbies - One suggestion I can think of
> is to pair a newbie with a seasoned Calcite contributor as a mentor for
> tasks which need such a pairing e.g. adding a new Database Dialect. It is a
> fairly big commitment/investment on the part of the mentor but it might
> make inroads into the project easier. Mentors could assign tasks of
> improving docs/samples/how-to guides to the newbies as they work with them.
> It would address the challenges for future adopters and also build
> confidence in the newbies to contribute.
>
> I will end on a thank you note, just the process of figuring out the
> relational algebra tree and translation taught me a lot. Thanks to you guys
> for all the contributions. Like I said earlier I am still hopeful we will
> come back to Calcite and thus plan to stay plugged in.
>
> Cheers,
>
> --Sandeep Nayak
>