You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Maxime Jattiot <mj...@opensense.fr> on 2016/03/17 22:32:05 UTC

Wondering if Calcite is a good fit for my use case

Hello everyone,

Your project seems very interesting but I am bit lost in the ocean of possibilities. Let me explain :

We are currently developing an application with a microservices architecture. Those services are written in Python but few are in Java or Go.
Of course a lot of them need to persist data and we are using a CQRS approach. The idea is that our micro services will use two kind of databases : one for ACID/OLTP and another one for OLAP.

Our main issue is that we want to be agnostic of the underlying databases because we will install our application into different clients environments.
These clients could have clusters of Mongo, Cassandra or Couchbase for OLTP or Spark for OLAP.

So our first idea was to stick to the SQL query language with ODBC drivers to stay agnostic and find a way to translate our SQL queries into whatever query languages such databases accept while keeping good performances.

Your framework seems to provide both a SQL translator + an optimizer. However it raised few questions :
- It seems you don’t have support for insert, update and delete but when digging I see that Apache Hive is using your framework and they now supports insert,update,delete. How comes ? Are they using only part of your framework ?
- Also you said you are a framework but can you work as an engine that is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can we cluster it to handle many clients queries ?

As you might feel I am a bit lost and I am wondering what we should do for our use case. So any insights are welcome, such as what would you do if you were us ?

Thank you very much,

Kind regards,

Maxime JATTIOT

Re: Wondering if Calcite is a good fit for my use case

Posted by Josh Elser <jo...@gmail.com>.

https://bitbucket.org/lalinsky/python-phoenixdb is the project, btw.

Not sure if Lukáš follows this mailing list.

Maxime Jattiot wrote:
> Hello everyone,
>
> Thank you for all your replies. It clarifies a lot.
> I will dig the others Apache projects that use Calcite and will come back
> to you if I have further questions.
> As suggested by Josh I'll also have a look at who's implementing this
> Python Database API.
>
> Thank you again and have a great week !
>
> Maxime
>
>
>
> On Fri, Mar 18, 2016 at 8:25 PM Josh Elser<jo...@gmail.com>  wrote:
>
>> In general, Avatica is just an abstraction of JDBC over an RPC mechanism
>> via a standalone server. Presently that RPC mechanism is HTTP with
>> Avatica "protocol" data serialized using JSON or Protocol Buffers.
>>
>> The general approach I've been trying to follow (agreed upon with Nick a
>> while back) was to try to keep Avatica horizontally scalable, not
>> requiring clients to be sticky. Avatica servers do not need to
>> communicate with each other and should be capable of operating in
>> parallel with other servers. It is an important caveat that clients
>> should still try to be sticky (for performance reasons), but it's been a
>> goal to not make this a requirement. I've done some preliminary testing
>> with Avatica behind HAProxy with initial success.
>>
>> Long term, I'd love to provide an array of libraries/drivers for
>> interfacing with Avatica in the language of your choice. We have the
>> Java-based implementation (reference implementation) that Avatica
>> directly provides in the form of a JDBC driver. We've also seen a Python
>> driver, targeted for Phoenix, that implements the Python Database 2.0
>> API. ODBC is also a moving target that we can hopefully nail down some day.
>>
>> If you have more specific questions WRT architecture/performance, I'd be
>> happy to try to answer them, Maxime!
>>
>> - Josh
>>
>> <snip/>
>>
>>> Josh, Can you weigh in on options for scaling out Avatica?
>>>
>>> Julian
>>>
>>>
>>>
>>>> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot<mj...@opensense.fr>
>> wrote:
>> <snip/>
>>
>>>> - Also you said you are a framework but can you work as an engine that
>> is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can
>> we cluster it to handle many clients queries ?
>> <snip/>
>>
>

Re: Wondering if Calcite is a good fit for my use case

Posted by Julian Hyde <jh...@apache.org>.

It was always one of the goals of Avatica that you could use the same RPC protocol & back-end server for JDBC and ODBC. I also wanted to ensure that if you wrote your own implementation of the Avatica server you could use the remote clients (ODBC or JDBC driver) unchanged. These goals drove the design choices we made in Avatica.

> On Mar 21, 2016, at 6:15 AM, Jesse Yates <je...@gmail.com> wrote:
> 
> I don't imagine it would be to hard, as long as
> we keep the Avatica protocol clearly defined.
> 
> Something you would be open to @Josh/@Julian?

It depends on how you define “hard”. I think it would take a couple of weeks to get a basic ODBC driver working, and a few more weeks to get to production quality (including an installer). But by all means prove me wrong! I’d be overjoyed if someone took this on.

I think I’ve done enough “little red hen” for Avatica. Time for others to step up.

Julian

Re: Wondering if Calcite is a good fit for my use case

Posted by Josh Elser <jo...@gmail.com>.

Yes -- I would love to find the time/motivation to figure out how to 
make one :). It's been a while since I've gotten my hands dirty in 
C/C++. If you (or anyone) wants to do hack on something, I'd be happy to 
help in whatever way possible.

The stable protocol should be there already (just need some better tests 
to double check it between releases).

Jesse Yates wrote:
> I think it would also be compelling to have an open source ODBC drive
> implementation for Avatica. AFAIK there are no 'good', open source
> implementations - there are a couple of vendors that sell them (or SDKs to
> build them, a dead source-forge project<http://odbcjdbc.sourceforge.net/>,
>   or folks roll their own*. I don't imagine it would be to hard, as long as
> we keep the Avatica protocol clearly defined.
>
> Something you would be open to @Josh/@Julian?
>
> --Jesse
>
> * would love to be correct here
>
> On Mon, Mar 21, 2016 at 1:18 AM Maxime Jattiot<mj...@opensense.fr>
> wrote:
>
>> Hello everyone,
>>
>> Thank you for all your replies. It clarifies a lot.
>> I will dig the others Apache projects that use Calcite and will come back
>> to you if I have further questions.
>> As suggested by Josh I'll also have a look at who's implementing this
>> Python Database API.
>>
>> Thank you again and have a great week !
>>
>> Maxime
>>
>>
>>
>> On Fri, Mar 18, 2016 at 8:25 PM Josh Elser<jo...@gmail.com>  wrote:
>>
>>> In general, Avatica is just an abstraction of JDBC over an RPC mechanism
>>> via a standalone server. Presently that RPC mechanism is HTTP with
>>> Avatica "protocol" data serialized using JSON or Protocol Buffers.
>>>
>>> The general approach I've been trying to follow (agreed upon with Nick a
>>> while back) was to try to keep Avatica horizontally scalable, not
>>> requiring clients to be sticky. Avatica servers do not need to
>>> communicate with each other and should be capable of operating in
>>> parallel with other servers. It is an important caveat that clients
>>> should still try to be sticky (for performance reasons), but it's been a
>>> goal to not make this a requirement. I've done some preliminary testing
>>> with Avatica behind HAProxy with initial success.
>>>
>>> Long term, I'd love to provide an array of libraries/drivers for
>>> interfacing with Avatica in the language of your choice. We have the
>>> Java-based implementation (reference implementation) that Avatica
>>> directly provides in the form of a JDBC driver. We've also seen a Python
>>> driver, targeted for Phoenix, that implements the Python Database 2.0
>>> API. ODBC is also a moving target that we can hopefully nail down some
>> day.
>>> If you have more specific questions WRT architecture/performance, I'd be
>>> happy to try to answer them, Maxime!
>>>
>>> - Josh
>>>
>>> <snip/>
>>>
>>>> Josh, Can you weigh in on options for scaling out Avatica?
>>>>
>>>> Julian
>>>>
>>>>
>>>>
>>>>> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot<mj...@opensense.fr>
>>> wrote:
>>> <snip/>
>>>
>>>>> - Also you said you are a framework but can you work as an engine that
>>> is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes
>> can
>>> we cluster it to handle many clients queries ?
>>> <snip/>
>>>
>

Re: Wondering if Calcite is a good fit for my use case

Posted by Jesse Yates <je...@gmail.com>.

I think it would also be compelling to have an open source ODBC drive
implementation for Avatica. AFAIK there are no 'good', open source
implementations - there are a couple of vendors that sell them (or SDKs to
build them, a dead source-forge project <http://odbcjdbc.sourceforge.net/>,
 or folks roll their own*. I don't imagine it would be to hard, as long as
we keep the Avatica protocol clearly defined.

Something you would be open to @Josh/@Julian?

--Jesse

* would love to be correct here

On Mon, Mar 21, 2016 at 1:18 AM Maxime Jattiot <mj...@opensense.fr>
wrote:

> Hello everyone,
>
> Thank you for all your replies. It clarifies a lot.
> I will dig the others Apache projects that use Calcite and will come back
> to you if I have further questions.
> As suggested by Josh I'll also have a look at who's implementing this
> Python Database API.
>
> Thank you again and have a great week !
>
> Maxime
>
>
>
> On Fri, Mar 18, 2016 at 8:25 PM Josh Elser <jo...@gmail.com> wrote:
>
> > In general, Avatica is just an abstraction of JDBC over an RPC mechanism
> > via a standalone server. Presently that RPC mechanism is HTTP with
> > Avatica "protocol" data serialized using JSON or Protocol Buffers.
> >
> > The general approach I've been trying to follow (agreed upon with Nick a
> > while back) was to try to keep Avatica horizontally scalable, not
> > requiring clients to be sticky. Avatica servers do not need to
> > communicate with each other and should be capable of operating in
> > parallel with other servers. It is an important caveat that clients
> > should still try to be sticky (for performance reasons), but it's been a
> > goal to not make this a requirement. I've done some preliminary testing
> > with Avatica behind HAProxy with initial success.
> >
> > Long term, I'd love to provide an array of libraries/drivers for
> > interfacing with Avatica in the language of your choice. We have the
> > Java-based implementation (reference implementation) that Avatica
> > directly provides in the form of a JDBC driver. We've also seen a Python
> > driver, targeted for Phoenix, that implements the Python Database 2.0
> > API. ODBC is also a moving target that we can hopefully nail down some
> day.
> >
> > If you have more specific questions WRT architecture/performance, I'd be
> > happy to try to answer them, Maxime!
> >
> > - Josh
> >
> > <snip/>
> >
> > > Josh, Can you weigh in on options for scaling out Avatica?
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot<mj...@opensense.fr>
> > wrote:
> > >>
> >
> > <snip/>
> >
> > >> - Also you said you are a framework but can you work as an engine that
> > is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes
> can
> > we cluster it to handle many clients queries ?
> > >>
> >
> > <snip/>
> >
>

Re: Wondering if Calcite is a good fit for my use case

Posted by Maxime Jattiot <mj...@opensense.fr>.

Hello everyone,

Thank you for all your replies. It clarifies a lot.
I will dig the others Apache projects that use Calcite and will come back
to you if I have further questions.
As suggested by Josh I'll also have a look at who's implementing this
Python Database API.

Thank you again and have a great week !

Maxime



On Fri, Mar 18, 2016 at 8:25 PM Josh Elser <jo...@gmail.com> wrote:

> In general, Avatica is just an abstraction of JDBC over an RPC mechanism
> via a standalone server. Presently that RPC mechanism is HTTP with
> Avatica "protocol" data serialized using JSON or Protocol Buffers.
>
> The general approach I've been trying to follow (agreed upon with Nick a
> while back) was to try to keep Avatica horizontally scalable, not
> requiring clients to be sticky. Avatica servers do not need to
> communicate with each other and should be capable of operating in
> parallel with other servers. It is an important caveat that clients
> should still try to be sticky (for performance reasons), but it's been a
> goal to not make this a requirement. I've done some preliminary testing
> with Avatica behind HAProxy with initial success.
>
> Long term, I'd love to provide an array of libraries/drivers for
> interfacing with Avatica in the language of your choice. We have the
> Java-based implementation (reference implementation) that Avatica
> directly provides in the form of a JDBC driver. We've also seen a Python
> driver, targeted for Phoenix, that implements the Python Database 2.0
> API. ODBC is also a moving target that we can hopefully nail down some day.
>
> If you have more specific questions WRT architecture/performance, I'd be
> happy to try to answer them, Maxime!
>
> - Josh
>
> <snip/>
>
> > Josh, Can you weigh in on options for scaling out Avatica?
> >
> > Julian
> >
> >
> >
> >> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot<mj...@opensense.fr>
> wrote:
> >>
>
> <snip/>
>
> >> - Also you said you are a framework but can you work as an engine that
> is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can
> we cluster it to handle many clients queries ?
> >>
>
> <snip/>
>

Re: Wondering if Calcite is a good fit for my use case

Posted by Josh Elser <jo...@gmail.com>.

In general, Avatica is just an abstraction of JDBC over an RPC mechanism 
via a standalone server. Presently that RPC mechanism is HTTP with 
Avatica "protocol" data serialized using JSON or Protocol Buffers.

The general approach I've been trying to follow (agreed upon with Nick a 
while back) was to try to keep Avatica horizontally scalable, not 
requiring clients to be sticky. Avatica servers do not need to 
communicate with each other and should be capable of operating in 
parallel with other servers. It is an important caveat that clients 
should still try to be sticky (for performance reasons), but it's been a 
goal to not make this a requirement. I've done some preliminary testing 
with Avatica behind HAProxy with initial success.

Long term, I'd love to provide an array of libraries/drivers for 
interfacing with Avatica in the language of your choice. We have the 
Java-based implementation (reference implementation) that Avatica 
directly provides in the form of a JDBC driver. We've also seen a Python 
driver, targeted for Phoenix, that implements the Python Database 2.0 
API. ODBC is also a moving target that we can hopefully nail down some day.

If you have more specific questions WRT architecture/performance, I'd be 
happy to try to answer them, Maxime!

- Josh

<snip/>

> Josh, Can you weigh in on options for scaling out Avatica?
>
> Julian
>
>
>
>> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot<mj...@opensense.fr>  wrote:
>>

<snip/>

>> - Also you said you are a framework but can you work as an engine that is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can we cluster it to handle many clients queries ?
>>

<snip/>

Re: Wondering if Calcite is a good fit for my use case

Posted by Julian Hyde <jh...@apache.org>.

It sounds as if Calcite is a very good fit. Data virtualization (which is how I’d describe your use case) was one of the main goals I had in mind when creating Calcite.

On the question of whether Calcite is a framework or engine, let’s look at how some systems use Calcite:

* Hive uses Calcite’s query planning framework, so it doesn’t use its SQL parser. It has its own engine.

* Drill goes a bit further, and uses both the parser/validator and planning framework. It has is own engine.

* Kylin goes further still, and uses the parser/validator, planning framework, and also Calcite’s engine for anything its own engine cannot do.

* Phoenix does much the same as Kylin, but also uses Avatica for JDBC.

The “engine” is the "enumerable convention”, the ability to generate code for each relational operator that is, basically, a Java iterator. Plus implementations of the SQL built-in operators. It doesn’t scale beyond a single JVM, but is nevertheless useful for catching what the underlying engine cannot do.

DML has not been a focus (most of our applications are analytics, and therefore some other system is writing the data) but it fits into the architecture just fine, and in fact Phoenix are doing a lot of DML work. We have basic support for INSERT, which we could strengthen, and we could add support for UPDATE, MERGE and DELETE.

Josh, Can you weigh in on options for scaling out Avatica?

Julian

> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot <mj...@opensense.fr> wrote:
> 
> Hello everyone,
> 
> Your project seems very interesting but I am bit lost in the ocean of possibilities. Let me explain :
> 
> We are currently developing an application with a microservices architecture. Those services are written in Python but few are in Java or Go.
> Of course a lot of them need to persist data and we are using a CQRS approach. The idea is that our micro services will use two kind of databases : one for ACID/OLTP and another one for OLAP.
> 
> Our main issue is that we want to be agnostic of the underlying databases because we will install our application into different clients environments. 
> These clients could have clusters of Mongo, Cassandra or Couchbase for OLTP or Spark for OLAP.
> 
> So our first idea was to stick to the SQL query language with ODBC drivers to stay agnostic and find a way to translate our SQL queries into whatever query languages such databases accept while keeping good performances.
> 
> Your framework seems to provide both a SQL translator + an optimizer. However it raised few questions :
> - It seems you don’t have support for insert, update and delete but when digging I see that Apache Hive is using your framework and they now supports insert,update,delete. How comes ? Are they using only part of your framework ?
> - Also you said you are a framework but can you work as an engine that is queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can we cluster it to handle many clients queries ?
> 
> As you might feel I am a bit lost and I am wondering what we should do for our use case. So any insights are welcome, such as what would you do if you were us ?
> 
> Thank you very much,
> 
> Kind regards,
> 
> Maxime JATTIOT
> 
> 
> 
>