You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Igor Guzenko (JIRA)" <ji...@apache.org> on 2019/08/15 18:40:00 UTC
[jira] [Commented] (CALCITE-3138) RelStructuredTypeFlattener
doesn't restructure ROW type fields
[ https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908399#comment-16908399 ]
Igor Guzenko commented on CALCITE-3138:
---------------------------------------
[~julianhyde], I didn't report the Jira, so I thought the title is ok. Could you please check the new one. If it's fine for you, I'll update PR and commit message accordingly tomorrow.
> RelStructuredTypeFlattener doesn't restructure ROW type fields
> ---------------------------------------------------------------
>
> Key: CALCITE-3138
> URL: https://issues.apache.org/jira/browse/CALCITE-3138
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Haisheng Yuan
> Assignee: Igor Guzenko
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.21.0
>
> Attachments: ROW_repro.patch
>
> Time Spent: 4h
> Remaining Estimate: 0h
>
> 1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support ROW type. However, ROW type is flattened by
> RelStructuredTypeFlattener just like struct. So when user queries one column with complex type ROW, after flattening and restructuring
> top level project returns reference to first inner primitive field of the requested column.
> 2) Another finding is related to ITEM expression applied to array of structs column. For example, let's imagine a table with column of
> type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array column
> from Scan and 1 is index literal. Current flattener generates two field acess expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account
> that ITEM($0, 1).b returns struct which also should be flattened.
> 3) In some cases applying of ITEM after flattenning is no longer possible. For example, consider column with type
> STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and Calcite creates ITEM($0,'b').
> After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and the old projection ITEM($0,'b') can't
> be applied anymore. So now it should be converted to refer only subset of fields ($1,$2) from flattening project.
> UPDATES IN EXPECTED TEST RESULTS:
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testNestedColumnType()
> {code}
> select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc'
> {code}
> OLD RESULT:
> {code}
> LogicalProject(ZIP=[$4])
> LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
> LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> 1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project,
> however in original query filtering should be done by CITY field.
> 2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original
> query requested ZIP field.
>
> UPDATED RESULT:
> {code}
> LogicalProject(ZIP=[$3])
> LogicalFilter(condition=[=($2, 'abc')])
> LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testStructTypeAlias()
> {code}
> select t.r AS myRow from (select row(row(1)) r from dept) t
> {code}
> OLD RESULT:
> {code}
> LogicalProject(MYROW$$0$$0=[1])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> 1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
> but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0),
> although the type doesn't match type returned by row(row(1)) expression.
> 2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'.
>
> UPDATED RESULT:
> {code}
> LogicalProject(MYROW=[ROW(ROW($0))])
> LogicalProject(MYROW$$0$$0=[1])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testFlattenRecords()
> {code}
> select employees[1] from dept_nested
> {code}
> OLD RESULT:
> {code}
> LogicalProject(EXPR$0=[$0])
> LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
> 1. Given selection of element by index from array, which by definition contain elements of type:
> {code}
> STRUCT<EMPNO:INT,
> ENAME:VARCHAR(10),
> DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
> DESC:VARCHAR(20),
> OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
> {code}
> First problem is that the intermediate project on top of the scan expected to return only flattened types.
> But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array
> field named 'SKILLS'.
> 2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead
> of whole array element of type struct.
>
> UPDATED RESULT:
> {code}
> LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))])
> LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
> Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java are more explicit and visible in PR.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)