You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Nicola Vitucci <ni...@gmail.com> on 2022/01/30 22:35:31 UTC

Using Calcite with Python

Hi all,

What would be the best way to use Calcite with Python? I've come up with
two potential solutions:

- using the jaydebeapi package, to connect via the JDBC driver directly
from a JVM created via jpype;
- using Apache Arrow via the pyarrow package, to connect in basically the
same way but creating Arrow objects with JdbcToArrowUtils (and optionally
converting them to Pandas).

Although the former is more straightforward, the latter allows to achieve
better performance (see [1] for instance) since it's exactly what Arrow is
for. I've created two Jupyter notebooks [2] showing each solution. What
would you recommend? Is there an even better approach?

Thanks,

Nicola

[1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
[2] https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python

Re: Using Calcite with Python

Posted by Nicola Vitucci <ni...@gmail.com>.

Thanks, Jacques. I've looked at dask-sql a few days ago, but the only use
of Calcite (via jpype) is for query planning. I'll follow your work on
GraalVM with interest.

Nicola

Il giorno mar 1 feb 2022 alle ore 00:58 Jacques Nadeau <ja...@apache.org>
ha scritto:

> A couple of related (possibly useful?) pointers here:
>
>    - Dask-sql [1] uses Calcite in a python context. Might be some good
>    stuff to leverage there.
>    - I'm working on compiling Calcite as a GraalVM shared native library
>    [2] as part of Substrait [3] with the goal of ultimately having a
> friendly
>    C binding [4] for use in non-jvm worlds. This connects to work being
> done
>    by others to support tools like Arrow and Velox [5] as Substrait targets
>    (and thus completing the path from c interface to native execution via
>    Calcite).
>
>
> [1] https://github.com/dask-contrib/dask-sql
> [2] https://issues.apache.org/jira/browse/CALCITE-4786
> [3] https://github.com/substrait-io/substrait/pull/120
> [4] https://github.com/jacques-n/substrait/pull/3
> [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev
>
> On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <ni...@gmail.com>
> wrote:
>
> > Hi Eugen, Michael, Gavin,
> >
> > Thank you very much for your input. Answering to your suggestions:
> >
> > - Phoenix client: I saw it but decided not to use it because it does not
> > seem very active and up to date (its Avatica version is 1.10, while
> latest
> > is 1.20). I may still give it a try though.
> > - Arrow Flight: I think it can be very useful especially, like Michael
> > mentioned, if it were integrated with Avatica as a transport; at the
> > moment, though, it is not.
> >
> > I am basically looking for a (relatively) easy and ready to implement,
> easy
> > to keep up to date, and reasonably performant solution. Although it
> incurs
> > some overhead, a solution based on Python + Java seems to me the most
> > reasonable for the time being. Do you have any other suggestions or
> > recommendations?
> >
> > Thanks again,
> >
> > Nicola
> >
> >
> >
> > Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org>
> > ha
> > scritto:
> >
> > > Flight is definitely another consideration for the future. Personally I
> > > think it would be most interesting to integrate Flight with Avatica as
> an
> > > alternative transport. But it would certainly also be useful to allow
> the
> > > Arrow adapter to connect to any Flight endpoint.
> > >
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > >
> > > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ra...@gmail.com> a
> > écrit :
> > >
> > > > This is really interesting stuff you've done in the example notebooks
> > > >
> > > > Nicola & Michael, I wonder if you could benefit from the
> > > recently-released
> > > > Arrow Flight SQL?
> > > >
> > > >
> > >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> > > >
> > > > I have asked Jacques about this a bit -- it's meant to be a
> > > standardization
> > > > for communicating SQL queries and metadata with Arrow.
> > > > I'm not intimately familiar with it, but it seems like it could be a
> > good
> > > > base to build a Calcite backend for Arrow from?
> > > >
> > > > They have a pretty thorough Java example in the repository:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> > > >
> > > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org>
> wrote:
> > > >
> > > > > You may want to keep an eye on CALCITE-2040 (
> > > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a
> > student
> > > > who
> > > > > is working on a Calcite adapter for Apache Arrow. We're basically
> > hung
> > > up
> > > > > waiting on the Arrow team to release a compatible JAR. This still
> > won't
> > > > > fully solve your problem though as the first version of the adapter
> > is
> > > > only
> > > > > capable of reading from Arrow files. However, the goal is
> eventually
> > to
> > > > > allow passing a memory reference into the adapter so that it would
> be
> > > > > possible to make use of Arrow data which is constructed in-memory
> > > > > elsewhere.
> > > > > --
> > > > > Michael Mior
> > > > > mmior@apache.org
> > > > >
> > > > >
> > > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> > > nicola.vitucci@gmail.com>
> > > > a
> > > > > écrit :
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > What would be the best way to use Calcite with Python? I've come
> up
> > > > with
> > > > > > two potential solutions:
> > > > > >
> > > > > > - using the jaydebeapi package, to connect via the JDBC driver
> > > directly
> > > > > > from a JVM created via jpype;
> > > > > > - using Apache Arrow via the pyarrow package, to connect in
> > basically
> > > > the
> > > > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > > > optionally
> > > > > > converting them to Pandas).
> > > > > >
> > > > > > Although the former is more straightforward, the latter allows to
> > > > achieve
> > > > > > better performance (see [1] for instance) since it's exactly what
> > > Arrow
> > > > > is
> > > > > > for. I've created two Jupyter notebooks [2] showing each
> solution.
> > > What
> > > > > > would you recommend? Is there an even better approach?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Nicola
> > > > > >
> > > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > > > [2]
> > > > >
> > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Using Calcite with Python

Posted by Gavin Ray <ra...@gmail.com>.

I have nothing of value to add, but:

> [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev

Hot damn this is neat

On Mon, Jan 31, 2022 at 7:58 PM Jacques Nadeau <ja...@apache.org> wrote:

> A couple of related (possibly useful?) pointers here:
>
>    - Dask-sql [1] uses Calcite in a python context. Might be some good
>    stuff to leverage there.
>    - I'm working on compiling Calcite as a GraalVM shared native library
>    [2] as part of Substrait [3] with the goal of ultimately having a
> friendly
>    C binding [4] for use in non-jvm worlds. This connects to work being
> done
>    by others to support tools like Arrow and Velox [5] as Substrait targets
>    (and thus completing the path from c interface to native execution via
>    Calcite).
>
>
> [1] https://github.com/dask-contrib/dask-sql
> [2] https://issues.apache.org/jira/browse/CALCITE-4786
> [3] https://github.com/substrait-io/substrait/pull/120
> [4] https://github.com/jacques-n/substrait/pull/3
> [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev
>
> On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <ni...@gmail.com>
> wrote:
>
> > Hi Eugen, Michael, Gavin,
> >
> > Thank you very much for your input. Answering to your suggestions:
> >
> > - Phoenix client: I saw it but decided not to use it because it does not
> > seem very active and up to date (its Avatica version is 1.10, while
> latest
> > is 1.20). I may still give it a try though.
> > - Arrow Flight: I think it can be very useful especially, like Michael
> > mentioned, if it were integrated with Avatica as a transport; at the
> > moment, though, it is not.
> >
> > I am basically looking for a (relatively) easy and ready to implement,
> easy
> > to keep up to date, and reasonably performant solution. Although it
> incurs
> > some overhead, a solution based on Python + Java seems to me the most
> > reasonable for the time being. Do you have any other suggestions or
> > recommendations?
> >
> > Thanks again,
> >
> > Nicola
> >
> >
> >
> > Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org>
> > ha
> > scritto:
> >
> > > Flight is definitely another consideration for the future. Personally I
> > > think it would be most interesting to integrate Flight with Avatica as
> an
> > > alternative transport. But it would certainly also be useful to allow
> the
> > > Arrow adapter to connect to any Flight endpoint.
> > >
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > >
> > > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ra...@gmail.com> a
> > écrit :
> > >
> > > > This is really interesting stuff you've done in the example notebooks
> > > >
> > > > Nicola & Michael, I wonder if you could benefit from the
> > > recently-released
> > > > Arrow Flight SQL?
> > > >
> > > >
> > >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> > > >
> > > > I have asked Jacques about this a bit -- it's meant to be a
> > > standardization
> > > > for communicating SQL queries and metadata with Arrow.
> > > > I'm not intimately familiar with it, but it seems like it could be a
> > good
> > > > base to build a Calcite backend for Arrow from?
> > > >
> > > > They have a pretty thorough Java example in the repository:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> > > >
> > > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org>
> wrote:
> > > >
> > > > > You may want to keep an eye on CALCITE-2040 (
> > > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a
> > student
> > > > who
> > > > > is working on a Calcite adapter for Apache Arrow. We're basically
> > hung
> > > up
> > > > > waiting on the Arrow team to release a compatible JAR. This still
> > won't
> > > > > fully solve your problem though as the first version of the adapter
> > is
> > > > only
> > > > > capable of reading from Arrow files. However, the goal is
> eventually
> > to
> > > > > allow passing a memory reference into the adapter so that it would
> be
> > > > > possible to make use of Arrow data which is constructed in-memory
> > > > > elsewhere.
> > > > > --
> > > > > Michael Mior
> > > > > mmior@apache.org
> > > > >
> > > > >
> > > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> > > nicola.vitucci@gmail.com>
> > > > a
> > > > > écrit :
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > What would be the best way to use Calcite with Python? I've come
> up
> > > > with
> > > > > > two potential solutions:
> > > > > >
> > > > > > - using the jaydebeapi package, to connect via the JDBC driver
> > > directly
> > > > > > from a JVM created via jpype;
> > > > > > - using Apache Arrow via the pyarrow package, to connect in
> > basically
> > > > the
> > > > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > > > optionally
> > > > > > converting them to Pandas).
> > > > > >
> > > > > > Although the former is more straightforward, the latter allows to
> > > > achieve
> > > > > > better performance (see [1] for instance) since it's exactly what
> > > Arrow
> > > > > is
> > > > > > for. I've created two Jupyter notebooks [2] showing each
> solution.
> > > What
> > > > > > would you recommend? Is there an even better approach?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Nicola
> > > > > >
> > > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > > > [2]
> > > > >
> > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Using Calcite with Python

Posted by Jacques Nadeau <ja...@apache.org>.

A couple of related (possibly useful?) pointers here:

   - Dask-sql [1] uses Calcite in a python context. Might be some good
   stuff to leverage there.
   - I'm working on compiling Calcite as a GraalVM shared native library
   [2] as part of Substrait [3] with the goal of ultimately having a friendly
   C binding [4] for use in non-jvm worlds. This connects to work being done
   by others to support tools like Arrow and Velox [5] as Substrait targets
   (and thus completing the path from c interface to native execution via
   Calcite).


[1] https://github.com/dask-contrib/dask-sql
[2] https://issues.apache.org/jira/browse/CALCITE-4786
[3] https://github.com/substrait-io/substrait/pull/120
[4] https://github.com/jacques-n/substrait/pull/3
[5] https://github.com/oap-project/gazelle-jni/tree/velox_dev

On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <ni...@gmail.com>
wrote:

> Hi Eugen, Michael, Gavin,
>
> Thank you very much for your input. Answering to your suggestions:
>
> - Phoenix client: I saw it but decided not to use it because it does not
> seem very active and up to date (its Avatica version is 1.10, while latest
> is 1.20). I may still give it a try though.
> - Arrow Flight: I think it can be very useful especially, like Michael
> mentioned, if it were integrated with Avatica as a transport; at the
> moment, though, it is not.
>
> I am basically looking for a (relatively) easy and ready to implement, easy
> to keep up to date, and reasonably performant solution. Although it incurs
> some overhead, a solution based on Python + Java seems to me the most
> reasonable for the time being. Do you have any other suggestions or
> recommendations?
>
> Thanks again,
>
> Nicola
>
>
>
> Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org>
> ha
> scritto:
>
> > Flight is definitely another consideration for the future. Personally I
> > think it would be most interesting to integrate Flight with Avatica as an
> > alternative transport. But it would certainly also be useful to allow the
> > Arrow adapter to connect to any Flight endpoint.
> >
> > --
> > Michael Mior
> > mmior@apache.org
> >
> >
> > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ra...@gmail.com> a
> écrit :
> >
> > > This is really interesting stuff you've done in the example notebooks
> > >
> > > Nicola & Michael, I wonder if you could benefit from the
> > recently-released
> > > Arrow Flight SQL?
> > >
> > >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> > >
> > > I have asked Jacques about this a bit -- it's meant to be a
> > standardization
> > > for communicating SQL queries and metadata with Arrow.
> > > I'm not intimately familiar with it, but it seems like it could be a
> good
> > > base to build a Calcite backend for Arrow from?
> > >
> > > They have a pretty thorough Java example in the repository:
> > >
> > >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> > >
> > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org> wrote:
> > >
> > > > You may want to keep an eye on CALCITE-2040 (
> > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a
> student
> > > who
> > > > is working on a Calcite adapter for Apache Arrow. We're basically
> hung
> > up
> > > > waiting on the Arrow team to release a compatible JAR. This still
> won't
> > > > fully solve your problem though as the first version of the adapter
> is
> > > only
> > > > capable of reading from Arrow files. However, the goal is eventually
> to
> > > > allow passing a memory reference into the adapter so that it would be
> > > > possible to make use of Arrow data which is constructed in-memory
> > > > elsewhere.
> > > > --
> > > > Michael Mior
> > > > mmior@apache.org
> > > >
> > > >
> > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> > nicola.vitucci@gmail.com>
> > > a
> > > > écrit :
> > > >
> > > > > Hi all,
> > > > >
> > > > > What would be the best way to use Calcite with Python? I've come up
> > > with
> > > > > two potential solutions:
> > > > >
> > > > > - using the jaydebeapi package, to connect via the JDBC driver
> > directly
> > > > > from a JVM created via jpype;
> > > > > - using Apache Arrow via the pyarrow package, to connect in
> basically
> > > the
> > > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > > optionally
> > > > > converting them to Pandas).
> > > > >
> > > > > Although the former is more straightforward, the latter allows to
> > > achieve
> > > > > better performance (see [1] for instance) since it's exactly what
> > Arrow
> > > > is
> > > > > for. I've created two Jupyter notebooks [2] showing each solution.
> > What
> > > > > would you recommend? Is there an even better approach?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Nicola
> > > > >
> > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > > [2]
> > > >
> https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > > >
> > > >
> > >
> >
>

Re: Using Calcite with Python

Posted by Nicola Vitucci <ni...@gmail.com>.

Hi Eugen, Michael, Gavin,

Thank you very much for your input. Answering to your suggestions:

- Phoenix client: I saw it but decided not to use it because it does not
seem very active and up to date (its Avatica version is 1.10, while latest
is 1.20). I may still give it a try though.
- Arrow Flight: I think it can be very useful especially, like Michael
mentioned, if it were integrated with Avatica as a transport; at the
moment, though, it is not.

I am basically looking for a (relatively) easy and ready to implement, easy
to keep up to date, and reasonably performant solution. Although it incurs
some overhead, a solution based on Python + Java seems to me the most
reasonable for the time being. Do you have any other suggestions or
recommendations?

Thanks again,

Nicola



Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org> ha
scritto:

> Flight is definitely another consideration for the future. Personally I
> think it would be most interesting to integrate Flight with Avatica as an
> alternative transport. But it would certainly also be useful to allow the
> Arrow adapter to connect to any Flight endpoint.
>
> --
> Michael Mior
> mmior@apache.org
>
>
> Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ra...@gmail.com> a écrit :
>
> > This is really interesting stuff you've done in the example notebooks
> >
> > Nicola & Michael, I wonder if you could benefit from the
> recently-released
> > Arrow Flight SQL?
> >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> >
> > I have asked Jacques about this a bit -- it's meant to be a
> standardization
> > for communicating SQL queries and metadata with Arrow.
> > I'm not intimately familiar with it, but it seems like it could be a good
> > base to build a Calcite backend for Arrow from?
> >
> > They have a pretty thorough Java example in the repository:
> >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> >
> > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org> wrote:
> >
> > > You may want to keep an eye on CALCITE-2040 (
> > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a student
> > who
> > > is working on a Calcite adapter for Apache Arrow. We're basically hung
> up
> > > waiting on the Arrow team to release a compatible JAR. This still won't
> > > fully solve your problem though as the first version of the adapter is
> > only
> > > capable of reading from Arrow files. However, the goal is eventually to
> > > allow passing a memory reference into the adapter so that it would be
> > > possible to make use of Arrow data which is constructed in-memory
> > > elsewhere.
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > >
> > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> nicola.vitucci@gmail.com>
> > a
> > > écrit :
> > >
> > > > Hi all,
> > > >
> > > > What would be the best way to use Calcite with Python? I've come up
> > with
> > > > two potential solutions:
> > > >
> > > > - using the jaydebeapi package, to connect via the JDBC driver
> directly
> > > > from a JVM created via jpype;
> > > > - using Apache Arrow via the pyarrow package, to connect in basically
> > the
> > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > optionally
> > > > converting them to Pandas).
> > > >
> > > > Although the former is more straightforward, the latter allows to
> > achieve
> > > > better performance (see [1] for instance) since it's exactly what
> Arrow
> > > is
> > > > for. I've created two Jupyter notebooks [2] showing each solution.
> What
> > > > would you recommend? Is there an even better approach?
> > > >
> > > > Thanks,
> > > >
> > > > Nicola
> > > >
> > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > [2]
> > > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > >
> > >
> >
>

Re: Using Calcite with Python

Posted by Michael Mior <mm...@apache.org>.

Flight is definitely another consideration for the future. Personally I
think it would be most interesting to integrate Flight with Avatica as an
alternative transport. But it would certainly also be useful to allow the
Arrow adapter to connect to any Flight endpoint.

--
Michael Mior
mmior@apache.org


Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ra...@gmail.com> a écrit :

> This is really interesting stuff you've done in the example notebooks
>
> Nicola & Michael, I wonder if you could benefit from the recently-released
> Arrow Flight SQL?
>
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
>
> I have asked Jacques about this a bit -- it's meant to be a standardization
> for communicating SQL queries and metadata with Arrow.
> I'm not intimately familiar with it, but it seems like it could be a good
> base to build a Calcite backend for Arrow from?
>
> They have a pretty thorough Java example in the repository:
>
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
>
> On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org> wrote:
>
> > You may want to keep an eye on CALCITE-2040 (
> > https://issues.apache.org/jira/browse/CALCITE-2040). I have a student
> who
> > is working on a Calcite adapter for Apache Arrow. We're basically hung up
> > waiting on the Arrow team to release a compatible JAR. This still won't
> > fully solve your problem though as the first version of the adapter is
> only
> > capable of reading from Arrow files. However, the goal is eventually to
> > allow passing a memory reference into the adapter so that it would be
> > possible to make use of Arrow data which is constructed in-memory
> > elsewhere.
> > --
> > Michael Mior
> > mmior@apache.org
> >
> >
> > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <ni...@gmail.com>
> a
> > écrit :
> >
> > > Hi all,
> > >
> > > What would be the best way to use Calcite with Python? I've come up
> with
> > > two potential solutions:
> > >
> > > - using the jaydebeapi package, to connect via the JDBC driver directly
> > > from a JVM created via jpype;
> > > - using Apache Arrow via the pyarrow package, to connect in basically
> the
> > > same way but creating Arrow objects with JdbcToArrowUtils (and
> optionally
> > > converting them to Pandas).
> > >
> > > Although the former is more straightforward, the latter allows to
> achieve
> > > better performance (see [1] for instance) since it's exactly what Arrow
> > is
> > > for. I've created two Jupyter notebooks [2] showing each solution. What
> > > would you recommend? Is there an even better approach?
> > >
> > > Thanks,
> > >
> > > Nicola
> > >
> > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > [2]
> > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > >
> >
>

Re: Using Calcite with Python

Posted by Gavin Ray <ra...@gmail.com>.

This is really interesting stuff you've done in the example notebooks

Nicola & Michael, I wonder if you could benefit from the recently-released
Arrow Flight SQL?
https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/

I have asked Jacques about this a bit -- it's meant to be a standardization
for communicating SQL queries and metadata with Arrow.
I'm not intimately familiar with it, but it seems like it could be a good
base to build a Calcite backend for Arrow from?

They have a pretty thorough Java example in the repository:
https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180

On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org> wrote:

> You may want to keep an eye on CALCITE-2040 (
> https://issues.apache.org/jira/browse/CALCITE-2040). I have a student who
> is working on a Calcite adapter for Apache Arrow. We're basically hung up
> waiting on the Arrow team to release a compatible JAR. This still won't
> fully solve your problem though as the first version of the adapter is only
> capable of reading from Arrow files. However, the goal is eventually to
> allow passing a memory reference into the adapter so that it would be
> possible to make use of Arrow data which is constructed in-memory
> elsewhere.
> --
> Michael Mior
> mmior@apache.org
>
>
> Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <ni...@gmail.com> a
> écrit :
>
> > Hi all,
> >
> > What would be the best way to use Calcite with Python? I've come up with
> > two potential solutions:
> >
> > - using the jaydebeapi package, to connect via the JDBC driver directly
> > from a JVM created via jpype;
> > - using Apache Arrow via the pyarrow package, to connect in basically the
> > same way but creating Arrow objects with JdbcToArrowUtils (and optionally
> > converting them to Pandas).
> >
> > Although the former is more straightforward, the latter allows to achieve
> > better performance (see [1] for instance) since it's exactly what Arrow
> is
> > for. I've created two Jupyter notebooks [2] showing each solution. What
> > would you recommend? Is there an even better approach?
> >
> > Thanks,
> >
> > Nicola
> >
> > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > [2]
> https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> >
>

Re: Using Calcite with Python

Posted by Michael Mior <mm...@apache.org>.

You may want to keep an eye on CALCITE-2040 (
https://issues.apache.org/jira/browse/CALCITE-2040). I have a student who
is working on a Calcite adapter for Apache Arrow. We're basically hung up
waiting on the Arrow team to release a compatible JAR. This still won't
fully solve your problem though as the first version of the adapter is only
capable of reading from Arrow files. However, the goal is eventually to
allow passing a memory reference into the adapter so that it would be
possible to make use of Arrow data which is constructed in-memory elsewhere.
--
Michael Mior
mmior@apache.org


Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <ni...@gmail.com> a
écrit :

> Hi all,
>
> What would be the best way to use Calcite with Python? I've come up with
> two potential solutions:
>
> - using the jaydebeapi package, to connect via the JDBC driver directly
> from a JVM created via jpype;
> - using Apache Arrow via the pyarrow package, to connect in basically the
> same way but creating Arrow objects with JdbcToArrowUtils (and optionally
> converting them to Pandas).
>
> Although the former is more straightforward, the latter allows to achieve
> better performance (see [1] for instance) since it's exactly what Arrow is
> for. I've created two Jupyter notebooks [2] showing each solution. What
> would you recommend? Is there an even better approach?
>
> Thanks,
>
> Nicola
>
> [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> [2] https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
>

Re: Using Calcite with Python

Posted by Eugen Stan <eu...@netdava.com>.

Hi Nicola,

It's a question I was asking myself the other day.
I don't know the answer but I do have an exploration direction:
Avatica client.

There is some nice description and diagram here
https://calcite.apache.org/avatica/docs/

And also a list of clients down bellow.

See 
https://calcite.apache.org/avatica/docs/#apache-phoenix-database-adapter-for-python

Please let me know how it goes and what you find out.


On 31.01.2022 00:35, Nicola Vitucci wrote:
> Hi all,
> 
> What would be the best way to use Calcite with Python? I've come up with
> two potential solutions:
> 
> - using the jaydebeapi package, to connect via the JDBC driver directly
> from a JVM created via jpype;
> - using Apache Arrow via the pyarrow package, to connect in basically the
> same way but creating Arrow objects with JdbcToArrowUtils (and optionally
> converting them to Pandas).
> 
> Although the former is more straightforward, the latter allows to achieve
> better performance (see [1] for instance) since it's exactly what Arrow is
> for. I've created two Jupyter notebooks [2] showing each solution. What
> would you recommend? Is there an even better approach?
> 
> Thanks,
> 
> Nicola
> 
> [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> [2] https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> 

Regards,
-- 
Eugen Stan

+40770 941 271  / https://www.netdava.com