You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@quickstep.apache.org by Jignesh Patel <ji...@pivotal.io> on 2016/06/14 19:22:18 UTC

Re: [jira] [Commented] (QUICKSTEP-20) Add parser support for SQL window aggregation function

Great point Julian and something that I have been thinking about too. 

I’d love to kick off a discussion to see how we can find a way to make this work. I’d love to give a talk to the Calcite team sometime later on this summer (July?) on Quickstep and explore this very synergy. 

Some of the ORCA guys have also been thinking about this. But, in the end it boils down to two things: a) Synergy and b) Some commitment from the two projects to make this work.

For Quickstep, the goal is quite clear: we want to focus on the key aspects of our platform that relate to fast query execution and flexible scheduling. But, need to do this in a way that is trivially easy for users to use. 

Cheers,
Jignesh  

> Julian Hyde commented on QUICKSTEP-20:
> --------------------------------------
> 
> It sounds as if Quickstep is going down the route of building a full SQL parser, validator, planner. This is fine, but it is a huge amount of work to produce something that is high quality. Have you considered using Apache Calcite? Calcite is written in Java but that shouldn't be too much of an issue because Calcite can work as a pre-processor, producing a physical plan that can be run without any Java in the runtime.
> 
>> Add parser support for SQL window aggregation function
>> ------------------------------------------------------
>> 
>>                Key: QUICKSTEP-20
>>                URL: https://issues.apache.org/jira/browse/QUICKSTEP-20
>>            Project: Apache Quickstep
>>         Issue Type: New Feature
>>         Components: Parser
>>           Reporter: Shixuan Fan
>>             Labels: features, newbie
>> 
>> The first part of window aggregation function. There will be new grammar introduced to the parser so that the parser could understand a window aggregation query.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)

Re: [jira] [Commented] (QUICKSTEP-20) Add parser support for SQL window aggregation function

Posted by Julian Hyde <jh...@apache.org>.

If you’re going to build a full database, go for it, best of luck to you, but you’re going to need a large community.

But in my opinion, you should focus on your strengths. The community should ask itself what is the one thing that it can do better than anyone else. If you build library that can be embedded in other projects and in commercial products, you will gain adoption. If you try to solve the whole problem, everyone (including other Apache projects) will be against you.

Research projects and companies have an interest in “going big”. But projects, in my experience, work better if you go small.

That said, it does help to build community if you allow people to contribute stuff that interests them. But make sure you know what your goal is, and how those contributions contribute to that goal.

Julian


> On Jun 15, 2016, at 8:15 AM, Jignesh Patel <ji...@pivotal.io> wrote:
> 
> Great points Julian, especially about algebra. Couldn’t agree more.
> 
> In fact, we have been strong advocates of the viewpoint that it is all about the algebraic framework. Furthermore, we have argued that the relational algebraic framework is the right “core” to build a platform. With it you can go well beyond warehousing/SQL but also (with small extensions) build:
> 
> #1: JSON document stores (see Argo <http://pages.cs.wisc.edu/~chasseur/pubs/argo-short.pdf>), 
> 
> #2: Iterative graph analytics (see Grail <http://www.cs.wisc.edu/~jignesh/publ/Grail.pdf>), 
> 
> #3: Relational learning (see QuickFOIL <http://www.cs.wisc.edu/~jignesh/publ/QuickFoil.pdf>), and
> 
> #4: Biological data management (see Periscope/SQ <http://www.vldb.org/conf/2007/papers/demo/p1406-tata.pdf> and Periscope/GQ <http://www.vldb.org/pvldb/1/1454184.pdf>).
> 
> If all of that is not enough, there are nice synergies between deeper integration of common classes of machine learning and relational data representation. A key idea here is factorized learning, which my student Arun Kumar (co-advised with Naughton) introduced last year <http://pages.cs.wisc.edu/~arun/orion/LearningOverJoinsSIGMOD.pdf>. Arun will present a far deeper follow-on paper <http://pages.cs.wisc.edu/~arun/hamlet/OptFSSIGMOD.pdf> on this topic at SIGMOD in a few weeks. Interestingly, many other papers are starting to build on these initial ideas. There is still a bunch of theory to figure out, as a research community, we are collectively getting very close to nailing that.
> 
> In my keynote @ SIGMOD last year <http://dl.acm.org/citation.cfm?doid=2723372.2723374>, I talked about how theory (see papers above) has shown that with an extended relational algebraic core these seemingly different applications converge to a platform that is powered by a relational core. This converged platform is the long-term vision for Quickstep. Yup — I hear you, I need to write this up for the community. You are right and I’m adding it to my list :-) 
> 
> We have shown prototypes for all of the above, but haven’t put it all together. That is the hard part, and we are at the start of that journey. That effort is also revealing all kinds of interesting systems research issues — so good for the students on the project. Potentially exciting times ahead!
> 
> Cheers,
> Jignesh 
> 
>> On Jun 14, 2016, at 2:32 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> Having that representation reduces coupling in your architecture, so is useful even if you don’t decide to use a library for SQL parsing/planning. But I think once you have it you will realize that all of the interesting problems for the project happen after the query has been converted to algebra.
>> 
>> Julian

Re: [jira] [Commented] (QUICKSTEP-20) Add parser support for SQL window aggregation function

Posted by Jignesh Patel <ji...@pivotal.io>.

Great points Julian, especially about algebra. Couldn’t agree more.

In fact, we have been strong advocates of the viewpoint that it is all about the algebraic framework. Furthermore, we have argued that the relational algebraic framework is the right “core” to build a platform. With it you can go well beyond warehousing/SQL but also (with small extensions) build:

#1: JSON document stores (see Argo <http://pages.cs.wisc.edu/~chasseur/pubs/argo-short.pdf>), 

#2: Iterative graph analytics (see Grail <http://www.cs.wisc.edu/~jignesh/publ/Grail.pdf>), 

#3: Relational learning (see QuickFOIL <http://www.cs.wisc.edu/~jignesh/publ/QuickFoil.pdf>), and

#4: Biological data management (see Periscope/SQ <http://www.vldb.org/conf/2007/papers/demo/p1406-tata.pdf> and Periscope/GQ <http://www.vldb.org/pvldb/1/1454184.pdf>).

If all of that is not enough, there are nice synergies between deeper integration of common classes of machine learning and relational data representation. A key idea here is factorized learning, which my student Arun Kumar (co-advised with Naughton) introduced last year <http://pages.cs.wisc.edu/~arun/orion/LearningOverJoinsSIGMOD.pdf>. Arun will present a far deeper follow-on paper <http://pages.cs.wisc.edu/~arun/hamlet/OptFSSIGMOD.pdf> on this topic at SIGMOD in a few weeks. Interestingly, many other papers are starting to build on these initial ideas. There is still a bunch of theory to figure out, as a research community, we are collectively getting very close to nailing that.

In my keynote @ SIGMOD last year <http://dl.acm.org/citation.cfm?doid=2723372.2723374>, I talked about how theory (see papers above) has shown that with an extended relational algebraic core these seemingly different applications converge to a platform that is powered by a relational core. This converged platform is the long-term vision for Quickstep. Yup — I hear you, I need to write this up for the community. You are right and I’m adding it to my list :-) 

We have shown prototypes for all of the above, but haven’t put it all together. That is the hard part, and we are at the start of that journey. That effort is also revealing all kinds of interesting systems research issues — so good for the students on the project. Potentially exciting times ahead!

Cheers,
Jignesh 

> On Jun 14, 2016, at 2:32 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> Having that representation reduces coupling in your architecture, so is useful even if you don’t decide to use a library for SQL parsing/planning. But I think once you have it you will realize that all of the interesting problems for the project happen after the query has been converted to algebra.
> 
> Julian

Re: [jira] [Commented] (QUICKSTEP-20) Add parser support for SQL window aggregation function

Posted by Julian Hyde <jh...@apache.org>.

The prerequisite for integrating a SQL parsing and/or planning front-end would be to have a lower-level interface than SQL. Imagine a text representation (in say JSON or XML) of a queries in logical or physical relational algebra. (Logical would have the logical operators, e.g. Join, and physical would have, say, QuickstepHashJoin and QuickstepSortMergeJoin).

Having that representation reduces coupling in your architecture, so is useful even if you don’t decide to use a library for SQL parsing/planning. But I think once you have it you will realize that all of the interesting problems for the project happen after the query has been converted to algebra.

Julian

> On Jun 14, 2016, at 12:22 PM, Jignesh Patel <ji...@pivotal.io> wrote:
> 
> Great point Julian and something that I have been thinking about too. 
> 
> I’d love to kick off a discussion to see how we can find a way to make this work. I’d love to give a talk to the Calcite team sometime later on this summer (July?) on Quickstep and explore this very synergy. 
> 
> Some of the ORCA guys have also been thinking about this. But, in the end it boils down to two things: a) Synergy and b) Some commitment from the two projects to make this work.
> 
> For Quickstep, the goal is quite clear: we want to focus on the key aspects of our platform that relate to fast query execution and flexible scheduling. But, need to do this in a way that is trivially easy for users to use. 
> 
> Cheers,
> Jignesh  
> 
>> Julian Hyde commented on QUICKSTEP-20:
>> --------------------------------------
>> 
>> It sounds as if Quickstep is going down the route of building a full SQL parser, validator, planner. This is fine, but it is a huge amount of work to produce something that is high quality. Have you considered using Apache Calcite? Calcite is written in Java but that shouldn't be too much of an issue because Calcite can work as a pre-processor, producing a physical plan that can be run without any Java in the runtime.
>> 
>>> Add parser support for SQL window aggregation function
>>> ------------------------------------------------------
>>> 
>>>               Key: QUICKSTEP-20
>>>               URL: https://issues.apache.org/jira/browse/QUICKSTEP-20
>>>           Project: Apache Quickstep
>>>        Issue Type: New Feature
>>>        Components: Parser
>>>          Reporter: Shixuan Fan
>>>            Labels: features, newbie
>>> 
>>> The first part of window aggregation function. There will be new grammar introduced to the parser so that the parser could understand a window aggregation query.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>