You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Khai Tran (Jira)" <ji...@apache.org> on 2019/08/21 17:41:00 UTC

[jira] [Comment Edited] (CALCITE-3122) Convert Pig Latin scripts into Calcite logical plan

    [ https://issues.apache.org/jira/browse/CALCITE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912542#comment-16912542 ] 

Khai Tran edited comment on CALCITE-3122 at 8/21/19 5:40 PM:
-------------------------------------------------------------

[~julianhyde] Fixed and pushed the code to address all your comments on 03/Aug. My answers for a few other questions:
 "Can you expand/clarify the javadoc of VirtualTable? I don't really understand what it is or why you need it" => These are tables needed for constructing Calcite plan from Pig DAG, schemas (or row types, to be precise) of these table are obtained by converting Pig schema into RelDataType. These tables are not queriable, scannable, just for the sake of represent table schemas used for other transformations on top of that.

The reason I named it VirtualTable and moved it to core because I need to use it later for other use cases at LinkedIn. For example, we parse GraphQL query to Calcite plan and convert it into SparkSQL for batch execution. So we may have a full story of online, nearline, and offline convergence with Calcite relational algebra as an IR. I may present this during my talk at ApacheCon next month.

Anyway, I rename it to PigTable and move it back to Piglet for now so that we can proceed.

Will work on two remaining issues (test for ToLogicalPlan and the planner issue) later today.


was (Author: khaitran):
[~julianhyde] Fixed and pushed the code to address all your comments on 03/Aug. My answers for a few other questions:
"Can you expand/clarify the javadoc of VirtualTable? I don't really understand what it is or why you need it" => These are tables needed for constructing Calcite plan from Pig DAG, schemas (or row types, to be precise) of these table are obtained by converting Pig schema into RelDataType. These tables are not queriable, scannable, just for the sake of represent table schemas used for other transformations on top of that.

The reason I named it VirtualTable and moved it to core because I need to use it later for other use cases at LinkedIn. For example, we parse GraphQL query to Calcite plan and convert it into SparkSQL for batch execution. So we may have a full story of online, nearline, and offline convergence with Calcite relational algebra as an IR. I may present this during my talk at ApacheCon next month.

Anyway, I rename it to PigTable and move it back to Piglet for now so that we can proceed.

> Convert Pig Latin scripts into Calcite logical plan 
> ----------------------------------------------------
>
>                 Key: CALCITE-3122
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3122
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core, piglet
>            Reporter: Khai Tran
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.21.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We create an internal Calcite repo at LinkedIn and develop APIs to parse any Pig Latin scripts into Calcite logical plan. The code was tested in nearly ~1000 Pig scripts written at LinkedIn.
> Changes:
> 1. piglet: main conversion code live there, include:
>  * APIs to convert any Pig scripts into RelNode plans or SQL statements
>  * Use Pig Grunt parser to parse Pig Latin scripts into Pig logical plan (DAGs)
>  * Convert Pig schemas into RelDatatype
>  * Traverse through Pig expression plan and convert Pig expressions into RexNodes
>  * Map some basic Pig UDFs to Calcite SQL operators
>  * Build Calcite UDFs for any other Pig UDFs, including UDFs written in both Java and Python
>  * Traverse (DFS) through Pig logical plans to convert each Pig logical nodes to RelNodes
>  * Have an optimizer rule to optimize Pig group/cogroup into Aggregate operators
> 2. core:
>  * Implement other RelNode in Rel2Sql so that Pig can be translated into SQL
>  * Other minor changes in a few other classes to make Pig to Calcite works



--
This message was sent by Atlassian Jira
(v8.3.2#803003)