You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Sandeep Joshi <sa...@gmail.com> on 2016/02/14 20:02:00 UTC

Question on language translation for Algebricks

I had some questions about the process of mapping other query languages to
Algebricks.  The Sigmod SoCC 15 paper mentions that two languages XQuery
and HiveQL which have been mapped to Algebricks, but the implementation is
not found in either of the two repositories released under Apache.

I found Hivesterix and Pregelix under
https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix

I couldn't find the XQuery to Algebricks translator anywhere.  Has this
been released ?

What is the reason these language translators are not part of the Apache
repository ?

The Apache repositories contain the language translators for AQL and SQL.
After comparing the implementations for Hivesterix and SQL/AQL, here are
some questions

1) Does one have to integrate the parser for a new language within the
Apache AsterixDB source tree, or can one build the Algebricks translator
outside the Apache tree and invoke the Hyracks job execution engine
directly, as is being done in the hivesterix implementation seen here.

https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java

2) When a query language is converted to Algebricks, the ICompilerFactory
converts one plan tree to another by calling Visitor::visit() on each node
of the source query.  Does this imply that the plan tree for the source
language can only be constructed in Java ?  Would it be
difficult/impossible to integrate a parser and plan tree generator which
was written in any language into Algebricks ?

3) In the Apache repositories, the query rewrite rules which are used
during optimization are found under two different repositories.

One in main asterixdb repository

https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules

and the other in the hyracks repository

https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules

Are these two sets of rules characteristically different or is this
duplication just an artifact of rapid prototyping ?

Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
there any SQL-specific rewrite rules which were added ?

-Sandeep

Re: Question on language translation for Algebricks

Posted by Mike Carey <dt...@gmail.com>.

That's indeed the general intent!  There hasn't been any formal work 
done; given its current operators, the algebra is basically in the class 
of what one might loosely call "nested relational/schema optional 
languages".  In terms of computational capabilities, such languages are 
probably in the same class as the relational algebra extended with 
aggregation and grouping and ordering - again, roughly speaking.  
There's support for ordered as well as unordered nested collections, so 
that might allow certain kinds of queries that would be at least very 
awkward for bag of records data models and languages.  Not sure.  (I 
haven't followed what the ACM PODS community may have done regarding 
query language capabilities since the very early days, I fear.)  In 
terms of things like Pregelix or graph languages, what's missing for 
sure at the Algebricks level is the power offered by 
recursion/transitive closure.  Once upon a time we were thinking about 
going there, but haven't gotten back to that yet.  (And have no 
immediate plans to do so.)
Cheers,
Mike


On 2/15/16 12:41 AM, Sandeep Joshi wrote:
> Thanks for all the helpful answers.  I will check out vxquery next.
>
> Algebricks seems like the equivalent of LLVM for query languages.   I am
> wondering if Algebricks is powerful enough to map any query language, be it
> graph-based, relational or hierarchical (mrql, sql, pregel).  Is there a
> formal proof of this expressive power ?   Will there always be a one-to-one
> correspondence between the plan trees of different languages or would there
> be a case where one would have to expand to look at sub-trees while doing
> query translation ?
>
> -Sandeep
>
>
> On Mon, Feb 15, 2016 at 4:21 AM, Mike Carey <dt...@gmail.com> wrote:
>
>> PS: There's an important point below that you shouldn't miss (Sandeep) if
>> you look at the Hivesterix code - if you find its approach puzzling, note
>> that it was designed to only add what was needed to run Hive queries on
>> Hyracks - and so that it could potentially be kept in upper-level sync with
>> Hive itself.  As a result, it was not done as a "Hive lookalike done right"
>> - it was done as a "Hive lookalike that lets the existing Hive code do as
>> much of the initial work as possible".
>>
>>
>>
>> On 2/14/16 2:48 PM, Yingyi Bu wrote:
>>
>>> Hi Sandeep,
>>>
>>> Here is the Hivesterix codebase in the Apache source tree:
>>>
>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13
>>>
>>> We have maintained Hivesterix up to hyracks-0.2.13, but stopped
>>> maintaining
>>> after that release. Mike has elaborated the reason.
>>>
>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>>> there
>>>>>
>>>> any SQL-specific rewrite rules which were added?
>>> That's exactly the motivation of the Algebricks project --- most rules
>>> that
>>> a typical SQL compiler implemented are not SQL-specific:-)
>>> However, there indeed are few Hive-specific rules that I added in order to
>>> get the Hive-on-Algebricks plan work efficiently:
>>>
>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules
>>>
>>> The Hivesterix implementation first translates a Hive-optimized MR plan
>>> into an Algebricks logical plan, and then let Algebricks do further
>>> optimizations and finally execute the resulting Hyracks job on the Hyracks
>>> runtime.
>>>
>>> Best,
>>> Yingyi
>>>
>>>
>>>
>>> On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <dt...@gmail.com> wrote:
>>>
>>> Sandeep,
>>>> Just to chime in as well:
>>>>
>>>>    - VXQuery is indeed the best example to look at, probably, to
>>>> understand
>>>> the AsterixDB/Algebricks separation.
>>>>
>>>>    - Hivesterix was built by Yingyi Bu (who'll see this) early on - it
>>>> drove
>>>> the separation idea, actually, but we made a decision not to try and
>>>> maintain it.  It was intended to provide a third/different proof of
>>>> separation and applicability of the approach, from a research standpoint,
>>>> but doesn't have additional value to offer the world (since Hive itself
>>>> is
>>>> a moving target and Hive on Tez now provides the non-MapReduce-runtime
>>>> value that Hivesterix initially offered).  Yingi would probably be happy
>>>> to
>>>> share the code base with you if you wanted to look at it for any reason,
>>>> but the only things in the Apache AsterixDB (incubating) project are
>>>> things
>>>> deemed worthy of engineering/maintenance work.
>>>>
>>>> Hope that helps too!
>>>>
>>>> Cheers,
>>>> Mike
>>>>
>>>>
>>>>
>>>> On 2/14/16 11:47 AM, Till Westmann wrote:
>>>>
>>>> Hi Sandeep,
>>>>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper,
>>>>> is
>>>>> a separate project [1].
>>>>>
>>>>> Specifically to your questions:
>>>>>
>>>>> 1) There is no need to implement other projects that use Algebricks
>>>>> inside of the AsterixDB source tree (as VXQuery shows).
>>>>>
>>>>> 2) It is clearly easier to combine a Java parser and plan tree generator
>>>>> with Algebricks, but there's no reason why one couldn't connect to other
>>>>> languages (e.g. by using a text-based intermediate format between the
>>>>> parser and the optimizer and between the plan generator and the
>>>>> runtime).
>>>>>
>>>>> 3) The reason for the different set of rules is that some are language
>>>>> agnostic and some are language-specific. As you can see in figure 2 of
>>>>> the
>>>>> paper a language implementation has to provide language-specific rules
>>>>> to
>>>>> augment the language-agnostic rules provided by Algebricks.
>>>>> Specifically, the rules in AsterixDB's asterix-algebra project augment
>>>>> the rules in Algebricks to support AsterixDB's query language AQL.
>>>>>
>>>>> Hope this helps,
>>>>> Till
>>>>>
>>>>> [1] http://vxquery.apache.org
>>>>>
>>>>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>>>>>
>>>>> I had some questions about the process of mapping other query languages
>>>>> to
>>>>>
>>>>>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages
>>>>>> XQuery
>>>>>> and HiveQL which have been mapped to Algebricks, but the implementation
>>>>>> is
>>>>>> not found in either of the two repositories released under Apache.
>>>>>>
>>>>>> I found Hivesterix and Pregelix under
>>>>>>
>>>>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>>>>>
>>>>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>>>>>> been released ?
>>>>>>
>>>>>> What is the reason these language translators are not part of the
>>>>>> Apache
>>>>>> repository ?
>>>>>>
>>>>>> The Apache repositories contain the language translators for AQL and
>>>>>> SQL.
>>>>>> After comparing the implementations for Hivesterix and SQL/AQL, here
>>>>>> are
>>>>>> some questions
>>>>>>
>>>>>> 1) Does one have to integrate the parser for a new language within the
>>>>>> Apache AsterixDB source tree, or can one build the Algebricks
>>>>>> translator
>>>>>> outside the Apache tree and invoke the Hyracks job execution engine
>>>>>> directly, as is being done in the hivesterix implementation seen here.
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>>>>>>
>>>>>> 2) When a query language is converted to Algebricks, the
>>>>>> ICompilerFactory
>>>>>> converts one plan tree to another by calling Visitor::visit() on each
>>>>>> node
>>>>>> of the source query.  Does this imply that the plan tree for the source
>>>>>> language can only be constructed in Java ?  Would it be
>>>>>> difficult/impossible to integrate a parser and plan tree generator
>>>>>> which
>>>>>> was written in any language into Algebricks ?
>>>>>>
>>>>>> 3) In the Apache repositories, the query rewrite rules which are used
>>>>>> during optimization are found under two different repositories.
>>>>>>
>>>>>> One in main asterixdb repository
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>>>>>>
>>>>>> and the other in the hyracks repository
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>>>>>>
>>>>>> Are these two sets of rules characteristically different or is this
>>>>>> duplication just an artifact of rapid prototyping ?
>>>>>>
>>>>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>>>> there any SQL-specific rewrite rules which were added ?
>>>>>>
>>>>>> -Sandeep
>>>>>>
>>>>>>

Re: Question on language translation for Algebricks

Posted by Sandeep Joshi <sa...@gmail.com>.

Thanks for all the helpful answers.  I will check out vxquery next.

Algebricks seems like the equivalent of LLVM for query languages.   I am
wondering if Algebricks is powerful enough to map any query language, be it
graph-based, relational or hierarchical (mrql, sql, pregel).  Is there a
formal proof of this expressive power ?   Will there always be a one-to-one
correspondence between the plan trees of different languages or would there
be a case where one would have to expand to look at sub-trees while doing
query translation ?

-Sandeep


On Mon, Feb 15, 2016 at 4:21 AM, Mike Carey <dt...@gmail.com> wrote:

> PS: There's an important point below that you shouldn't miss (Sandeep) if
> you look at the Hivesterix code - if you find its approach puzzling, note
> that it was designed to only add what was needed to run Hive queries on
> Hyracks - and so that it could potentially be kept in upper-level sync with
> Hive itself.  As a result, it was not done as a "Hive lookalike done right"
> - it was done as a "Hive lookalike that lets the existing Hive code do as
> much of the initial work as possible".
>
>
>
> On 2/14/16 2:48 PM, Yingyi Bu wrote:
>
>> Hi Sandeep,
>>
>> Here is the Hivesterix codebase in the Apache source tree:
>>
>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13
>>
>> We have maintained Hivesterix up to hyracks-0.2.13, but stopped
>> maintaining
>> after that release. Mike has elaborated the reason.
>>
>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>> there
>>>>
>>> any SQL-specific rewrite rules which were added?
>> That's exactly the motivation of the Algebricks project --- most rules
>> that
>> a typical SQL compiler implemented are not SQL-specific:-)
>> However, there indeed are few Hive-specific rules that I added in order to
>> get the Hive-on-Algebricks plan work efficiently:
>>
>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules
>>
>> The Hivesterix implementation first translates a Hive-optimized MR plan
>> into an Algebricks logical plan, and then let Algebricks do further
>> optimizations and finally execute the resulting Hyracks job on the Hyracks
>> runtime.
>>
>> Best,
>> Yingyi
>>
>>
>>
>> On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <dt...@gmail.com> wrote:
>>
>> Sandeep,
>>>
>>> Just to chime in as well:
>>>
>>>   - VXQuery is indeed the best example to look at, probably, to
>>> understand
>>> the AsterixDB/Algebricks separation.
>>>
>>>   - Hivesterix was built by Yingyi Bu (who'll see this) early on - it
>>> drove
>>> the separation idea, actually, but we made a decision not to try and
>>> maintain it.  It was intended to provide a third/different proof of
>>> separation and applicability of the approach, from a research standpoint,
>>> but doesn't have additional value to offer the world (since Hive itself
>>> is
>>> a moving target and Hive on Tez now provides the non-MapReduce-runtime
>>> value that Hivesterix initially offered).  Yingi would probably be happy
>>> to
>>> share the code base with you if you wanted to look at it for any reason,
>>> but the only things in the Apache AsterixDB (incubating) project are
>>> things
>>> deemed worthy of engineering/maintenance work.
>>>
>>> Hope that helps too!
>>>
>>> Cheers,
>>> Mike
>>>
>>>
>>>
>>> On 2/14/16 11:47 AM, Till Westmann wrote:
>>>
>>> Hi Sandeep,
>>>>
>>>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper,
>>>> is
>>>> a separate project [1].
>>>>
>>>> Specifically to your questions:
>>>>
>>>> 1) There is no need to implement other projects that use Algebricks
>>>> inside of the AsterixDB source tree (as VXQuery shows).
>>>>
>>>> 2) It is clearly easier to combine a Java parser and plan tree generator
>>>> with Algebricks, but there's no reason why one couldn't connect to other
>>>> languages (e.g. by using a text-based intermediate format between the
>>>> parser and the optimizer and between the plan generator and the
>>>> runtime).
>>>>
>>>> 3) The reason for the different set of rules is that some are language
>>>> agnostic and some are language-specific. As you can see in figure 2 of
>>>> the
>>>> paper a language implementation has to provide language-specific rules
>>>> to
>>>> augment the language-agnostic rules provided by Algebricks.
>>>> Specifically, the rules in AsterixDB's asterix-algebra project augment
>>>> the rules in Algebricks to support AsterixDB's query language AQL.
>>>>
>>>> Hope this helps,
>>>> Till
>>>>
>>>> [1] http://vxquery.apache.org
>>>>
>>>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>>>>
>>>> I had some questions about the process of mapping other query languages
>>>> to
>>>>
>>>>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages
>>>>> XQuery
>>>>> and HiveQL which have been mapped to Algebricks, but the implementation
>>>>> is
>>>>> not found in either of the two repositories released under Apache.
>>>>>
>>>>> I found Hivesterix and Pregelix under
>>>>>
>>>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>>>>
>>>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>>>>> been released ?
>>>>>
>>>>> What is the reason these language translators are not part of the
>>>>> Apache
>>>>> repository ?
>>>>>
>>>>> The Apache repositories contain the language translators for AQL and
>>>>> SQL.
>>>>> After comparing the implementations for Hivesterix and SQL/AQL, here
>>>>> are
>>>>> some questions
>>>>>
>>>>> 1) Does one have to integrate the parser for a new language within the
>>>>> Apache AsterixDB source tree, or can one build the Algebricks
>>>>> translator
>>>>> outside the Apache tree and invoke the Hyracks job execution engine
>>>>> directly, as is being done in the hivesterix implementation seen here.
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>>>>>
>>>>> 2) When a query language is converted to Algebricks, the
>>>>> ICompilerFactory
>>>>> converts one plan tree to another by calling Visitor::visit() on each
>>>>> node
>>>>> of the source query.  Does this imply that the plan tree for the source
>>>>> language can only be constructed in Java ?  Would it be
>>>>> difficult/impossible to integrate a parser and plan tree generator
>>>>> which
>>>>> was written in any language into Algebricks ?
>>>>>
>>>>> 3) In the Apache repositories, the query rewrite rules which are used
>>>>> during optimization are found under two different repositories.
>>>>>
>>>>> One in main asterixdb repository
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>>>>>
>>>>> and the other in the hyracks repository
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>>>>>
>>>>> Are these two sets of rules characteristically different or is this
>>>>> duplication just an artifact of rapid prototyping ?
>>>>>
>>>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>>> there any SQL-specific rewrite rules which were added ?
>>>>>
>>>>> -Sandeep
>>>>>
>>>>>
>

Re: Question on language translation for Algebricks

Posted by Mike Carey <dt...@gmail.com>.

PS: There's an important point below that you shouldn't miss (Sandeep) 
if you look at the Hivesterix code - if you find its approach puzzling, 
note that it was designed to only add what was needed to run Hive 
queries on Hyracks - and so that it could potentially be kept in 
upper-level sync with Hive itself.  As a result, it was not done as a 
"Hive lookalike done right" - it was done as a "Hive lookalike that lets 
the existing Hive code do as much of the initial work as possible".


On 2/14/16 2:48 PM, Yingyi Bu wrote:
> Hi Sandeep,
>
> Here is the Hivesterix codebase in the Apache source tree:
> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13
>
> We have maintained Hivesterix up to hyracks-0.2.13, but stopped maintaining
> after that release. Mike has elaborated the reason.
>
>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are there
> any SQL-specific rewrite rules which were added?
> That's exactly the motivation of the Algebricks project --- most rules that
> a typical SQL compiler implemented are not SQL-specific:-)
> However, there indeed are few Hive-specific rules that I added in order to
> get the Hive-on-Algebricks plan work efficiently:
> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules
>
> The Hivesterix implementation first translates a Hive-optimized MR plan
> into an Algebricks logical plan, and then let Algebricks do further
> optimizations and finally execute the resulting Hyracks job on the Hyracks
> runtime.
>
> Best,
> Yingyi
>
>
>
> On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <dt...@gmail.com> wrote:
>
>> Sandeep,
>>
>> Just to chime in as well:
>>
>>   - VXQuery is indeed the best example to look at, probably, to understand
>> the AsterixDB/Algebricks separation.
>>
>>   - Hivesterix was built by Yingyi Bu (who'll see this) early on - it drove
>> the separation idea, actually, but we made a decision not to try and
>> maintain it.  It was intended to provide a third/different proof of
>> separation and applicability of the approach, from a research standpoint,
>> but doesn't have additional value to offer the world (since Hive itself is
>> a moving target and Hive on Tez now provides the non-MapReduce-runtime
>> value that Hivesterix initially offered).  Yingi would probably be happy to
>> share the code base with you if you wanted to look at it for any reason,
>> but the only things in the Apache AsterixDB (incubating) project are things
>> deemed worthy of engineering/maintenance work.
>>
>> Hope that helps too!
>>
>> Cheers,
>> Mike
>>
>>
>>
>> On 2/14/16 11:47 AM, Till Westmann wrote:
>>
>>> Hi Sandeep,
>>>
>>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper, is
>>> a separate project [1].
>>>
>>> Specifically to your questions:
>>>
>>> 1) There is no need to implement other projects that use Algebricks
>>> inside of the AsterixDB source tree (as VXQuery shows).
>>>
>>> 2) It is clearly easier to combine a Java parser and plan tree generator
>>> with Algebricks, but there's no reason why one couldn't connect to other
>>> languages (e.g. by using a text-based intermediate format between the
>>> parser and the optimizer and between the plan generator and the runtime).
>>>
>>> 3) The reason for the different set of rules is that some are language
>>> agnostic and some are language-specific. As you can see in figure 2 of the
>>> paper a language implementation has to provide language-specific rules to
>>> augment the language-agnostic rules provided by Algebricks.
>>> Specifically, the rules in AsterixDB's asterix-algebra project augment
>>> the rules in Algebricks to support AsterixDB's query language AQL.
>>>
>>> Hope this helps,
>>> Till
>>>
>>> [1] http://vxquery.apache.org
>>>
>>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>>>
>>> I had some questions about the process of mapping other query languages to
>>>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages XQuery
>>>> and HiveQL which have been mapped to Algebricks, but the implementation
>>>> is
>>>> not found in either of the two repositories released under Apache.
>>>>
>>>> I found Hivesterix and Pregelix under
>>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>>>
>>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>>>> been released ?
>>>>
>>>> What is the reason these language translators are not part of the Apache
>>>> repository ?
>>>>
>>>> The Apache repositories contain the language translators for AQL and SQL.
>>>> After comparing the implementations for Hivesterix and SQL/AQL, here are
>>>> some questions
>>>>
>>>> 1) Does one have to integrate the parser for a new language within the
>>>> Apache AsterixDB source tree, or can one build the Algebricks translator
>>>> outside the Apache tree and invoke the Hyracks job execution engine
>>>> directly, as is being done in the hivesterix implementation seen here.
>>>>
>>>>
>>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>>>>
>>>> 2) When a query language is converted to Algebricks, the ICompilerFactory
>>>> converts one plan tree to another by calling Visitor::visit() on each
>>>> node
>>>> of the source query.  Does this imply that the plan tree for the source
>>>> language can only be constructed in Java ?  Would it be
>>>> difficult/impossible to integrate a parser and plan tree generator which
>>>> was written in any language into Algebricks ?
>>>>
>>>> 3) In the Apache repositories, the query rewrite rules which are used
>>>> during optimization are found under two different repositories.
>>>>
>>>> One in main asterixdb repository
>>>>
>>>>
>>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>>>>
>>>> and the other in the hyracks repository
>>>>
>>>>
>>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>>>>
>>>> Are these two sets of rules characteristically different or is this
>>>> duplication just an artifact of rapid prototyping ?
>>>>
>>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>> there any SQL-specific rewrite rules which were added ?
>>>>
>>>> -Sandeep
>>>>

Re: Question on language translation for Algebricks

Posted by Yingyi Bu <bu...@gmail.com>.

Hi Sandeep,

Here is the Hivesterix codebase in the Apache source tree:
https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13

We have maintained Hivesterix up to hyracks-0.2.13, but stopped maintaining
after that release. Mike has elaborated the reason.

>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are there
any SQL-specific rewrite rules which were added?
That's exactly the motivation of the Algebricks project --- most rules that
a typical SQL compiler implemented are not SQL-specific:-)
However, there indeed are few Hive-specific rules that I added in order to
get the Hive-on-Algebricks plan work efficiently:
https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules

The Hivesterix implementation first translates a Hive-optimized MR plan
into an Algebricks logical plan, and then let Algebricks do further
optimizations and finally execute the resulting Hyracks job on the Hyracks
runtime.

Best,
Yingyi



On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <dt...@gmail.com> wrote:

> Sandeep,
>
> Just to chime in as well:
>
>  - VXQuery is indeed the best example to look at, probably, to understand
> the AsterixDB/Algebricks separation.
>
>  - Hivesterix was built by Yingyi Bu (who'll see this) early on - it drove
> the separation idea, actually, but we made a decision not to try and
> maintain it.  It was intended to provide a third/different proof of
> separation and applicability of the approach, from a research standpoint,
> but doesn't have additional value to offer the world (since Hive itself is
> a moving target and Hive on Tez now provides the non-MapReduce-runtime
> value that Hivesterix initially offered).  Yingi would probably be happy to
> share the code base with you if you wanted to look at it for any reason,
> but the only things in the Apache AsterixDB (incubating) project are things
> deemed worthy of engineering/maintenance work.
>
> Hope that helps too!
>
> Cheers,
> Mike
>
>
>
> On 2/14/16 11:47 AM, Till Westmann wrote:
>
>> Hi Sandeep,
>>
>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper, is
>> a separate project [1].
>>
>> Specifically to your questions:
>>
>> 1) There is no need to implement other projects that use Algebricks
>> inside of the AsterixDB source tree (as VXQuery shows).
>>
>> 2) It is clearly easier to combine a Java parser and plan tree generator
>> with Algebricks, but there's no reason why one couldn't connect to other
>> languages (e.g. by using a text-based intermediate format between the
>> parser and the optimizer and between the plan generator and the runtime).
>>
>> 3) The reason for the different set of rules is that some are language
>> agnostic and some are language-specific. As you can see in figure 2 of the
>> paper a language implementation has to provide language-specific rules to
>> augment the language-agnostic rules provided by Algebricks.
>> Specifically, the rules in AsterixDB's asterix-algebra project augment
>> the rules in Algebricks to support AsterixDB's query language AQL.
>>
>> Hope this helps,
>> Till
>>
>> [1] http://vxquery.apache.org
>>
>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>>
>> I had some questions about the process of mapping other query languages to
>>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages XQuery
>>> and HiveQL which have been mapped to Algebricks, but the implementation
>>> is
>>> not found in either of the two repositories released under Apache.
>>>
>>> I found Hivesterix and Pregelix under
>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>>
>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>>> been released ?
>>>
>>> What is the reason these language translators are not part of the Apache
>>> repository ?
>>>
>>> The Apache repositories contain the language translators for AQL and SQL.
>>> After comparing the implementations for Hivesterix and SQL/AQL, here are
>>> some questions
>>>
>>> 1) Does one have to integrate the parser for a new language within the
>>> Apache AsterixDB source tree, or can one build the Algebricks translator
>>> outside the Apache tree and invoke the Hyracks job execution engine
>>> directly, as is being done in the hivesterix implementation seen here.
>>>
>>>
>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>>>
>>> 2) When a query language is converted to Algebricks, the ICompilerFactory
>>> converts one plan tree to another by calling Visitor::visit() on each
>>> node
>>> of the source query.  Does this imply that the plan tree for the source
>>> language can only be constructed in Java ?  Would it be
>>> difficult/impossible to integrate a parser and plan tree generator which
>>> was written in any language into Algebricks ?
>>>
>>> 3) In the Apache repositories, the query rewrite rules which are used
>>> during optimization are found under two different repositories.
>>>
>>> One in main asterixdb repository
>>>
>>>
>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>>>
>>> and the other in the hyracks repository
>>>
>>>
>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>>>
>>> Are these two sets of rules characteristically different or is this
>>> duplication just an artifact of rapid prototyping ?
>>>
>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>> there any SQL-specific rewrite rules which were added ?
>>>
>>> -Sandeep
>>>
>>
>

Re: Question on language translation for Algebricks

Posted by Mike Carey <dt...@gmail.com>.

Sandeep,

Just to chime in as well:

  - VXQuery is indeed the best example to look at, probably, to 
understand the AsterixDB/Algebricks separation.

  - Hivesterix was built by Yingyi Bu (who'll see this) early on - it 
drove the separation idea, actually, but we made a decision not to try 
and maintain it.  It was intended to provide a third/different proof of 
separation and applicability of the approach, from a research 
standpoint, but doesn't have additional value to offer the world (since 
Hive itself is a moving target and Hive on Tez now provides the 
non-MapReduce-runtime value that Hivesterix initially offered).  Yingi 
would probably be happy to share the code base with you if you wanted to 
look at it for any reason, but the only things in the Apache AsterixDB 
(incubating) project are things deemed worthy of engineering/maintenance 
work.

Hope that helps too!

Cheers,
Mike


On 2/14/16 11:47 AM, Till Westmann wrote:
> Hi Sandeep,
>
> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper, 
> is a separate project [1].
>
> Specifically to your questions:
>
> 1) There is no need to implement other projects that use Algebricks 
> inside of the AsterixDB source tree (as VXQuery shows).
>
> 2) It is clearly easier to combine a Java parser and plan tree 
> generator with Algebricks, but there's no reason why one couldn't 
> connect to other languages (e.g. by using a text-based intermediate 
> format between the parser and the optimizer and between the plan 
> generator and the runtime).
>
> 3) The reason for the different set of rules is that some are language 
> agnostic and some are language-specific. As you can see in figure 2 of 
> the paper a language implementation has to provide language-specific 
> rules to augment the language-agnostic rules provided by Algebricks.
> Specifically, the rules in AsterixDB's asterix-algebra project augment 
> the rules in Algebricks to support AsterixDB's query language AQL.
>
> Hope this helps,
> Till
>
> [1] http://vxquery.apache.org
>
> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>
>> I had some questions about the process of mapping other query 
>> languages to
>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages XQuery
>> and HiveQL which have been mapped to Algebricks, but the 
>> implementation is
>> not found in either of the two repositories released under Apache.
>>
>> I found Hivesterix and Pregelix under
>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>
>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>> been released ?
>>
>> What is the reason these language translators are not part of the Apache
>> repository ?
>>
>> The Apache repositories contain the language translators for AQL and 
>> SQL.
>> After comparing the implementations for Hivesterix and SQL/AQL, here are
>> some questions
>>
>> 1) Does one have to integrate the parser for a new language within the
>> Apache AsterixDB source tree, or can one build the Algebricks translator
>> outside the Apache tree and invoke the Hyracks job execution engine
>> directly, as is being done in the hivesterix implementation seen here.
>>
>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java 
>>
>>
>> 2) When a query language is converted to Algebricks, the 
>> ICompilerFactory
>> converts one plan tree to another by calling Visitor::visit() on each 
>> node
>> of the source query.  Does this imply that the plan tree for the source
>> language can only be constructed in Java ?  Would it be
>> difficult/impossible to integrate a parser and plan tree generator which
>> was written in any language into Algebricks ?
>>
>> 3) In the Apache repositories, the query rewrite rules which are used
>> during optimization are found under two different repositories.
>>
>> One in main asterixdb repository
>>
>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules 
>>
>>
>> and the other in the hyracks repository
>>
>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules 
>>
>>
>> Are these two sets of rules characteristically different or is this
>> duplication just an artifact of rapid prototyping ?
>>
>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>> there any SQL-specific rewrite rules which were added ?
>>
>> -Sandeep

Re: Question on language translation for Algebricks

Posted by Till Westmann <ti...@apache.org>.

Hi Sandeep,

Apache VXQuery, the XQuery implementation mentioned in the SoCC paper, 
is a separate project [1].

Specifically to your questions:

1) There is no need to implement other projects that use Algebricks 
inside of the AsterixDB source tree (as VXQuery shows).

2) It is clearly easier to combine a Java parser and plan tree generator 
with Algebricks, but there's no reason why one couldn't connect to other 
languages (e.g. by using a text-based intermediate format between the 
parser and the optimizer and between the plan generator and the 
runtime).

3) The reason for the different set of rules is that some are language 
agnostic and some are language-specific. As you can see in figure 2 of 
the paper a language implementation has to provide language-specific 
rules to augment the language-agnostic rules provided by Algebricks.
Specifically, the rules in AsterixDB's asterix-algebra project augment 
the rules in Algebricks to support AsterixDB's query language AQL.

Hope this helps,
Till

[1] http://vxquery.apache.org

On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:

> I had some questions about the process of mapping other query 
> languages to
> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages 
> XQuery
> and HiveQL which have been mapped to Algebricks, but the 
> implementation is
> not found in either of the two repositories released under Apache.
>
> I found Hivesterix and Pregelix under
> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>
> I couldn't find the XQuery to Algebricks translator anywhere.  Has 
> this
> been released ?
>
> What is the reason these language translators are not part of the 
> Apache
> repository ?
>
> The Apache repositories contain the language translators for AQL and 
> SQL.
> After comparing the implementations for Hivesterix and SQL/AQL, here 
> are
> some questions
>
> 1) Does one have to integrate the parser for a new language within the
> Apache AsterixDB source tree, or can one build the Algebricks 
> translator
> outside the Apache tree and invoke the Hyracks job execution engine
> directly, as is being done in the hivesterix implementation seen here.
>
> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>
> 2) When a query language is converted to Algebricks, the 
> ICompilerFactory
> converts one plan tree to another by calling Visitor::visit() on each 
> node
> of the source query.  Does this imply that the plan tree for the 
> source
> language can only be constructed in Java ?  Would it be
> difficult/impossible to integrate a parser and plan tree generator 
> which
> was written in any language into Algebricks ?
>
> 3) In the Apache repositories, the query rewrite rules which are used
> during optimization are found under two different repositories.
>
> One in main asterixdb repository
>
> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>
> and the other in the hyracks repository
>
> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>
> Are these two sets of rules characteristically different or is this
> duplication just an artifact of rapid prototyping ?
>
> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
> there any SQL-specific rewrite rules which were added ?
>
> -Sandeep