You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Ben Teeuwen <be...@booking.com.INVALID> on 2019/01/29 16:55:50 UTC

calcite sql applied on spark dataframes

Hi all,

I'm interested in trying out Calcite with the goal of being able to apply
the same SQL statement in 2 different setups. One is an offline batch
setting with a Spark dataframe. So this could be to do basic additions or
subtractions using multiple columns, or datediff operations on 2 timestamp
columns. As it is Spark, the SQL statement could be applied to millions of
rows in parallel. The other setup is in an online setup with Java services
processing individual requests. Spark has its own powerful SQL engine, but
the goal here would be to try and use Calcite and rule out
incompatibilities between the Spark SQL engine and Calcite's engine used in
java land.

Does anyone have experience with such an approach? I scanned the mailing
archive for messages about Spark but haven't seen it.

Ben

Re: [External] Re: calcite sql applied on spark dataframes

Posted by Ben Teeuwen <be...@booking.com.INVALID>.
Thanks Julian for the link.

Did anyone gain significant experience interacting with Spark since you
reported this ticket April 2017?

On Tue, Jan 29, 2019 at 8:56 PM Julian Hyde <jh...@apache.org> wrote:

> Did you see this: https://issues.apache.org/jira/browse/CALCITE-1737 <
> https://issues.apache.org/jira/browse/CALCITE-1737> ?
>
> > On Jan 29, 2019, at 8:55 AM, Ben Teeuwen <be...@booking.com.INVALID>
> wrote:
> >
> > Hi all,
> >
> > I'm interested in trying out Calcite with the goal of being able to apply
> > the same SQL statement in 2 different setups. One is an offline batch
> > setting with a Spark dataframe. So this could be to do basic additions or
> > subtractions using multiple columns, or datediff operations on 2
> timestamp
> > columns. As it is Spark, the SQL statement could be applied to millions
> of
> > rows in parallel. The other setup is in an online setup with Java
> services
> > processing individual requests. Spark has its own powerful SQL engine,
> but
> > the goal here would be to try and use Calcite and rule out
> > incompatibilities between the Spark SQL engine and Calcite's engine used
> in
> > java land.
> >
> > Does anyone have experience with such an approach? I scanned the mailing
> > archive for messages about Spark but haven't seen it.
> >
> > Ben
>

Re: calcite sql applied on spark dataframes

Posted by Julian Hyde <jh...@apache.org>.
Did you see this: https://issues.apache.org/jira/browse/CALCITE-1737 <https://issues.apache.org/jira/browse/CALCITE-1737> ? 

> On Jan 29, 2019, at 8:55 AM, Ben Teeuwen <be...@booking.com.INVALID> wrote:
> 
> Hi all,
> 
> I'm interested in trying out Calcite with the goal of being able to apply
> the same SQL statement in 2 different setups. One is an offline batch
> setting with a Spark dataframe. So this could be to do basic additions or
> subtractions using multiple columns, or datediff operations on 2 timestamp
> columns. As it is Spark, the SQL statement could be applied to millions of
> rows in parallel. The other setup is in an online setup with Java services
> processing individual requests. Spark has its own powerful SQL engine, but
> the goal here would be to try and use Calcite and rule out
> incompatibilities between the Spark SQL engine and Calcite's engine used in
> java land.
> 
> Does anyone have experience with such an approach? I scanned the mailing
> archive for messages about Spark but haven't seen it.
> 
> Ben