You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/03/27 21:16:53 UTC
[jira] [Commented] (DRILL-1499) Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.

    [ https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384534#comment-14384534 ] 

Victoria Markman commented on DRILL-1499:
-----------------------------------------

I just ran into this again, in completely different context and it only happened when I had Exchange in my plan.

{code:sql}
select     *
from  	test  left outer join  j2
on (COALESCE(test.c_integer, 10000) = COALESCE(j2.c_integer, 10000))
where
	test.c_integer IS NULL 
	and j2.c_integer IS NULL
{code}

If we are not planning to fix it ever, we need to make sure it is documented in our SQL section on differences between us and SQL standard along with NULL behavior.

> Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1499
>                 URL: https://issues.apache.org/jira/browse/DRILL-1499
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> For a select * query referring to a schema-less table, Drill could return different column, depending on the physical operators the query involves:
> Q1:
> {code}
> select * from cp.`employee.json` limit 3;
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> | employee_id | full_name  | first_name | last_name  | position_id | position_title |  store_id  | department_id | birth_date | hire_date  |   salary   | supervisor_id | education_level | marital_status |   gender   | management_role |
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> {code}
> Q2:
> {code}
> select * from cp.`employee.json` order by last_name limit 3;
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> | birth_date | department_id | education_level | employee_id | first_name | full_name  |   gender   | hire_date  | last_name  | management_role | marital_status | position_id | position_title |   salary   |  store_id  | supervisor_id |
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> {code}
> The difference between Q1 and Q2 is the order by clause.  With order by clause in Q2, Drill will sort the column names alphabetically, while for Q1, the column names are in the same order as in the data source. 
> The underlying cause for such difference is that the sort or sort-based merger operator would require canonicalization, since the incoming batches could contain different schemas. 
>  However, it would be better that such canonicalization is used only when the incoming batches have changing schemas. If all the incoming batches have identical schemas, no need to sort the column orders.  With this fix, Drill will present the same column order in the result set, for a schema-less select * query,  if there is no changing schemas from incoming data sources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)