You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Pranav Deshpande <de...@gmail.com> on 2022/07/20 22:56:40 UTC

Requesting Information Regarding Data Federation

Dear Apache Calcite Team,
I am trying to learn Calcite and wish to build a poc for data federation.

In the video here, https://www.youtube.com/watch?v=4JAOkLKrcYE, somehow the
presenter and his team managed to squash parts of the Relational Nodes into
"Spark Tables" and then Spark handled the execution of those.

How do I exactly go about doing this?

As per this discussion I understand that one has to create a RelOptRule to
do the same.

Also, one has to somehow define the cost (I don't know how to do this).

Is there a simple tutorial which demonstrates the basics of this? Like some
kind of simple implementation with ListTable etc.

Thanks & Regards,
Pranav

Re: Requesting Information Regarding Data Federation

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Pranav,

A very simplistic example of using Calcite for data integration can be
found here [1] along with some links to presentations and relevant material.

Apart from Apache Drill, Apache Hive is using Calcite for executing
federated queries. The main entry point is CalcitePlannerAction#apply [2]
where most of the Calcite configuration is done.

Best,
Stamatis

[1] https://github.com/zabetak/cy-calcite-tutorial
[2]
https://github.com/apache/hive/blob/834308091624c1a69cba7a8b97919ed1ff0fc616/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1646

On Thu, Jul 21, 2022 at 2:01 AM Charles Givre <cg...@gmail.com> wrote:

> Hi Pranav,
> You might want to take a look at Apache Drill, as it uses Calcite as a
> query planner and can executed federated queries against a pretty wide
> array of data sets.
> Best,
> -- C
>
> > On Jul 20, 2022, at 6:56 PM, Pranav Deshpande <
> deshpande.v.pranav@gmail.com> wrote:
> >
> > Dear Apache Calcite Team,
> > I am trying to learn Calcite and wish to build a poc for data federation.
> >
> > In the video here, https://www.youtube.com/watch?v=4JAOkLKrcYE, somehow
> the
> > presenter and his team managed to squash parts of the Relational Nodes
> into
> > "Spark Tables" and then Spark handled the execution of those.
> >
> > How do I exactly go about doing this?
> >
> > As per this discussion I understand that one has to create a RelOptRule
> to
> > do the same.
> >
> > Also, one has to somehow define the cost (I don't know how to do this).
> >
> > Is there a simple tutorial which demonstrates the basics of this? Like
> some
> > kind of simple implementation with ListTable etc.
> >
> > Thanks & Regards,
> > Pranav
>
>

Re: Requesting Information Regarding Data Federation

Posted by Charles Givre <cg...@gmail.com>.
Hi Pranav, 
You might want to take a look at Apache Drill, as it uses Calcite as a query planner and can executed federated queries against a pretty wide array of data sets.
Best, 
-- C

> On Jul 20, 2022, at 6:56 PM, Pranav Deshpande <de...@gmail.com> wrote:
> 
> Dear Apache Calcite Team,
> I am trying to learn Calcite and wish to build a poc for data federation.
> 
> In the video here, https://www.youtube.com/watch?v=4JAOkLKrcYE, somehow the
> presenter and his team managed to squash parts of the Relational Nodes into
> "Spark Tables" and then Spark handled the execution of those.
> 
> How do I exactly go about doing this?
> 
> As per this discussion I understand that one has to create a RelOptRule to
> do the same.
> 
> Also, one has to somehow define the cost (I don't know how to do this).
> 
> Is there a simple tutorial which demonstrates the basics of this? Like some
> kind of simple implementation with ListTable etc.
> 
> Thanks & Regards,
> Pranav