You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Ron Wheeler <rw...@artifact-software.com> on 2016/01/25 04:18:01 UTC

Calcite Packaging question from a newbee

In the "marketing" videos, Calcite seems to be based on the idea that 
the whole question of database access can be best solved by a modular 
approach.
I was surprised to see a lot of storage layer stuff mixed in with the core.

I want to add an adapter and incorporate this into an application that 
has its own database abstraction which currently supports Jackrabbit 
with an in-memory configuration.
I can not afford to add a whole bunch of code and third party libraries 
that are not essential.

Is there a plan to produce a lightweight core package that is suitable 
for use with an adapter?

Ron

-- 
Ron Wheeler
President
Artifact Software Inc
email: rwheeler@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Re: Calcite Packaging question from a newbee

Posted by Julian Hyde <jh...@apache.org>.

Sure - if you have more questions, ask on this list. I’ve found that you can achieve a lot just by implementing the table interface. Then you can start to push down filter, project etc. as you tune the adapter.

> On Jan 25, 2016, at 2:57 PM, Ron Wheeler <rw...@artifact-software.com> wrote:
> 
> Very good explanation. There may be some elements in this that could be added to the website.
> 
> I am just getting started and am pretty excited about the possibility of using Calcite in our ADTransform package.
> ADTransform has a pretty big footprint in it current state due to some pretty powerful libraries such as JasperReports that are pretty big.
> 
> After I asked the question, I had to dig a bit deeper to get the CSV Adapter demo running and got a bit of a picture about the jar files that exist.
> 
> I should have looked at the Maven Central repo to see that I can in fact get the core as a separate dependency even if the sources are in a single git project.
> 
> I am sorry for putting your through such a long bit of writing but it is very helpful in looking forward.
> 
> I have the suspicion that I would have saved a lot of grief and ode writing had I known about Calcite 4 years ago.
> I have been using SQL (Oracle and MySQL) since 1982 and am a big fan.
> 
> I think that a lot of the plug-ins that we wrote to add functionality to ADTransform will be made obsolete by a plugin that accepts SQL statements instead of a string of parameters.
> 
> I may have some questions and issues relating to our audit trail and our error log.
> For example, we identify individual records that fail a transformation or validation.
> "Uniqueness test failed on row 270 key "abc" is a  duplicate of record 185."
> In row 200 "manager" rwheeler not found in "int_person".
> These things are not something SQL is very good at reporting.
> 
> I am sure that I will have lots of questions if I write an adapter. Having never worked on the internals of ORACLE or MySQL, I am finding that the JavaDocs are a bit daunting at times and include concepts that I have never thought about.
> 
> Thanks for your response.
> 
> Ron
> 
> On 25/01/2016 5:16 PM, Julian Hyde wrote:
>> Ron,
>> 
>> I’ll attempt to answer but I might have misunderstood your use case. So if I am off the mark, please describe the architecture you want in more detail.
>> 
>> Let’s suppose you have a NoSQL database X and your application wants to speak SQL to it. You would include three Calcite modules: core, avatica, and the module for database X. The adapter for each database tends to be in a separate module (mongodb, spark, splunk, csv). JDBC is in core, but doesn’t bring in any dependencies.
>> 
>> Core doesn’t contain dependencies on third-party systems. It does depend on things needed by the core functionality, for example Jackson, because we want people to be able to write their models in JSON.
>> 
>> Reading pom.xml, the runtime dependencies of core are as follows:
>> * calcite-apatica
>>   * protobuf
>>   * jackson
>> * calcite-linq4j
>> * commons-dbcp
>> * findbugs-jsr305
>> * guava
>> * eigenbase-properties
>> * janino
>> * pentaho-aggdesigner-algorithm
>> 
>> I don’t think any of those has significant downstream dependencies.
>> 
>> I’m very happy to have a discussion of what should be in core, and what its dependencies should be. There’s no “perfect” decomposition of a project into modules, but we can get a little nearer to perfection if we listen to how people (such as you) are deploying the project in the real world.
>> 
>> After all that, if core is too heavyweight, then maybe you want *remote* Calcite rather than *embedded* Calcite. On your client you can include JUST avatica. On your server you would run avatica-server and include avatica and core and anything else you desire. Avatica has extremely low dependencies - you need either Jackson or Protobuf (depending on how you are encoding the RPCs). We didn’t even include Guava, even though we lean heavily on it elsewhere in Calcite.
>> 
>> Julian
>> 
>>> On Jan 24, 2016, at 7:18 PM, Ron Wheeler <rw...@artifact-software.com> wrote:
>>> 
>>> In the "marketing" videos, Calcite seems to be based on the idea that the whole question of database access can be best solved by a modular approach.
>>> I was surprised to see a lot of storage layer stuff mixed in with the core.
>>> 
>>> I want to add an adapter and incorporate this into an application that has its own database abstraction which currently supports Jackrabbit with an in-memory configuration.
>>> I can not afford to add a whole bunch of code and third party libraries that are not essential.
>>> 
>>> Is there a plan to produce a lightweight core package that is suitable for use with an adapter?
>>> 
>>> Ron
>>> 
>>> -- 
>>> Ron Wheeler
>>> President
>>> Artifact Software Inc
>>> email: rwheeler@artifact-software.com
>>> skype: ronaldmwheeler
>>> phone: 866-970-2435, ext 102
>>> 
>> 
> 
> 
> -- 
> Ron Wheeler
> President
> Artifact Software Inc
> email: rwheeler@artifact-software.com
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>

Re: Calcite Packaging question from a newbee

Posted by Ron Wheeler <rw...@artifact-software.com>.

Very good explanation. There may be some elements in this that could be 
added to the website.

I am just getting started and am pretty excited about the possibility of 
using Calcite in our ADTransform package.
ADTransform has a pretty big footprint in it current state due to some 
pretty powerful libraries such as JasperReports that are pretty big.

After I asked the question, I had to dig a bit deeper to get the CSV 
Adapter demo running and got a bit of a picture about the jar files that 
exist.

I should have looked at the Maven Central repo to see that I can in fact 
get the core as a separate dependency even if the sources are in a 
single git project.

I am sorry for putting your through such a long bit of writing but it is 
very helpful in looking forward.

I have the suspicion that I would have saved a lot of grief and ode 
writing had I known about Calcite 4 years ago.
I have been using SQL (Oracle and MySQL) since 1982 and am a big fan.

I think that a lot of the plug-ins that we wrote to add functionality to 
ADTransform will be made obsolete by a plugin that accepts SQL 
statements instead of a string of parameters.

I may have some questions and issues relating to our audit trail and our 
error log.
For example, we identify individual records that fail a transformation 
or validation.
"Uniqueness test failed on row 270 key "abc" is a  duplicate of record 185."
In row 200 "manager" rwheeler not found in "int_person".
These things are not something SQL is very good at reporting.

I am sure that I will have lots of questions if I write an adapter. 
Having never worked on the internals of ORACLE or MySQL, I am finding 
that the JavaDocs are a bit daunting at times and include concepts that 
I have never thought about.

Thanks for your response.

Ron

On 25/01/2016 5:16 PM, Julian Hyde wrote:
> Ron,
>
> I’ll attempt to answer but I might have misunderstood your use case. So if I am off the mark, please describe the architecture you want in more detail.
>
> Let’s suppose you have a NoSQL database X and your application wants to speak SQL to it. You would include three Calcite modules: core, avatica, and the module for database X. The adapter for each database tends to be in a separate module (mongodb, spark, splunk, csv). JDBC is in core, but doesn’t bring in any dependencies.
>
> Core doesn’t contain dependencies on third-party systems. It does depend on things needed by the core functionality, for example Jackson, because we want people to be able to write their models in JSON.
>
> Reading pom.xml, the runtime dependencies of core are as follows:
> * calcite-apatica
>    * protobuf
>    * jackson
> * calcite-linq4j
> * commons-dbcp
> * findbugs-jsr305
> * guava
> * eigenbase-properties
> * janino
> * pentaho-aggdesigner-algorithm
>
> I don’t think any of those has significant downstream dependencies.
>
> I’m very happy to have a discussion of what should be in core, and what its dependencies should be. There’s no “perfect” decomposition of a project into modules, but we can get a little nearer to perfection if we listen to how people (such as you) are deploying the project in the real world.
>
> After all that, if core is too heavyweight, then maybe you want *remote* Calcite rather than *embedded* Calcite. On your client you can include JUST avatica. On your server you would run avatica-server and include avatica and core and anything else you desire. Avatica has extremely low dependencies - you need either Jackson or Protobuf (depending on how you are encoding the RPCs). We didn’t even include Guava, even though we lean heavily on it elsewhere in Calcite.
>
> Julian
>
>> On Jan 24, 2016, at 7:18 PM, Ron Wheeler <rw...@artifact-software.com> wrote:
>>
>> In the "marketing" videos, Calcite seems to be based on the idea that the whole question of database access can be best solved by a modular approach.
>> I was surprised to see a lot of storage layer stuff mixed in with the core.
>>
>> I want to add an adapter and incorporate this into an application that has its own database abstraction which currently supports Jackrabbit with an in-memory configuration.
>> I can not afford to add a whole bunch of code and third party libraries that are not essential.
>>
>> Is there a plan to produce a lightweight core package that is suitable for use with an adapter?
>>
>> Ron
>>
>> -- 
>> Ron Wheeler
>> President
>> Artifact Software Inc
>> email: rwheeler@artifact-software.com
>> skype: ronaldmwheeler
>> phone: 866-970-2435, ext 102
>>
>

-- 
Ron Wheeler
President
Artifact Software Inc
email: rwheeler@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Re: Calcite Packaging question from a newbee

Posted by Julian Hyde <jh...@apache.org>.

Ron,

I’ll attempt to answer but I might have misunderstood your use case. So if I am off the mark, please describe the architecture you want in more detail.

Let’s suppose you have a NoSQL database X and your application wants to speak SQL to it. You would include three Calcite modules: core, avatica, and the module for database X. The adapter for each database tends to be in a separate module (mongodb, spark, splunk, csv). JDBC is in core, but doesn’t bring in any dependencies.

Core doesn’t contain dependencies on third-party systems. It does depend on things needed by the core functionality, for example Jackson, because we want people to be able to write their models in JSON.

Reading pom.xml, the runtime dependencies of core are as follows:
* calcite-apatica
  * protobuf
  * jackson
* calcite-linq4j
* commons-dbcp
* findbugs-jsr305
* guava
* eigenbase-properties
* janino
* pentaho-aggdesigner-algorithm

I don’t think any of those has significant downstream dependencies.

I’m very happy to have a discussion of what should be in core, and what its dependencies should be. There’s no “perfect” decomposition of a project into modules, but we can get a little nearer to perfection if we listen to how people (such as you) are deploying the project in the real world.

After all that, if core is too heavyweight, then maybe you want *remote* Calcite rather than *embedded* Calcite. On your client you can include JUST avatica. On your server you would run avatica-server and include avatica and core and anything else you desire. Avatica has extremely low dependencies - you need either Jackson or Protobuf (depending on how you are encoding the RPCs). We didn’t even include Guava, even though we lean heavily on it elsewhere in Calcite.

Julian

> On Jan 24, 2016, at 7:18 PM, Ron Wheeler <rw...@artifact-software.com> wrote:
> 
> In the "marketing" videos, Calcite seems to be based on the idea that the whole question of database access can be best solved by a modular approach.
> I was surprised to see a lot of storage layer stuff mixed in with the core.
> 
> I want to add an adapter and incorporate this into an application that has its own database abstraction which currently supports Jackrabbit with an in-memory configuration.
> I can not afford to add a whole bunch of code and third party libraries that are not essential.
> 
> Is there a plan to produce a lightweight core package that is suitable for use with an adapter?
> 
> Ron
> 
> -- 
> Ron Wheeler
> President
> Artifact Software Inc
> email: rwheeler@artifact-software.com
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>