You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2016/05/25 15:45:13 UTC
[jira] [Commented] (DRILL-4693) Incorrect column ordering when CONVERT_FROM() json is used

    [ https://issues.apache.org/jira/browse/DRILL-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300247#comment-15300247 ] 

Aman Sinha commented on DRILL-4693:
-----------------------------------

Based on an initial investigation the behavior can be explained as follows: the planner generates the top level Project as col1, col2, col3. The ProjectRecordBatch at run time produces the schema in a different order as col1, col3, col2 because of the CONVERT_FROM() function which is handled in a deferred manner compared to other expressions in the project list.  The CONVERT_FROM() expressions are appended to the VectorContainer.  This is still OK because Drill execution at operator level is based on names of the fields rather than column ordinals.  

Note that the planner initially inserts an extra Project at the top level for final column re-ordering (see https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L504).  However, this Project gets dropped subsequently as a 'trivial' Project because the child is producing columns in the same order.   We may want to treat this as a non-trivial Project whenever complex function such as convert_from json is present.   However, I am not yet sure if there may be additional changes needed in ProjectRecordBatch. 


> Incorrect column ordering when CONVERT_FROM() json is used 
> -----------------------------------------------------------
>
>                 Key: DRILL-4693
>                 URL: https://issues.apache.org/jira/browse/DRILL-4693
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators, Query Planning & Optimization
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>
> For the following query, the column order in the results is wrong..it should be col1, col2, col3. 
> {noformat}
> 0: jdbc:drill:zk=local> select 'abc' as col1, convert_from('{"x" : "y"}', 'json') as col2, 'xyz' as col3 from cp.`tpch/region.parquet`;
> +-------+-------+------------+
> | col1  | col3  |    col2    |
> +-------+-------+------------+
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> +-------+-------+------------+
> {noformat}
> The EXPLAIN plan:
> {noformat}
> 0: jdbc:drill:zk=local> explain plan for select 'abc' as col1, convert_from('{"x" : "y"}', 'json') as col2, 'xyz' as col3 from cp.`tpch/region.parquet`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(col1=['abc'], col2=[CONVERT_FROMJSON('{"x" : "y"}')], col3=['xyz'])
> 00-02        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[]]])
> {noformat}
> This happens on current master branch as well as 1.6.0 and even earlier (I checked 1.4.0 as well which also has the same behavior).  So it is a pre-existing bug.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)