You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Jason Altekruse (JIRA)" <ji...@apache.org> on 2016/03/14 19:50:33 UTC

[jira] [Commented] (DRILL-4505) Can't group by or sort across files with different schema

    [ https://issues.apache.org/jira/browse/DRILL-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193881#comment-15193881 ] 

Jason Altekruse commented on DRILL-4505:
----------------------------------------

[~tobad357] Can you try to add a cast to your column APPLICATION_ID? Work is ongoing to fully support changing schema, which includes a concept of an untyped null that tries to defer materialization until it is needed. In this case I believe it is possible that we are materializing the column that does not appear in some of the files to a default type (we arbitrarily chose nullable bigint before starting work on the full changing schema support). Casting these automatically materialized nulls to the correct type may resolve the issue you are seeing.

If this doesn't fix the issue, you can try to enable the union type, but it is currently considered an experimental feature and needs to be more thoroughly tested.

alter session set `exec.enable_union_type` = true

https://issues.apache.org/jira/browse/DRILL-3229

> Can't group by or sort across files with different schema
> ---------------------------------------------------------
>
>                 Key: DRILL-4505
>                 URL: https://issues.apache.org/jira/browse/DRILL-4505
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.5.0
>         Environment: Java 1.8
>            Reporter: Tobias
>
> We are currently trying out the support for querying across parquet files with different schemas.
> Simple selects work well but when we wan't to do sort or group by Drill returns "UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with changing schemas Fragment 0:0 [Error Id: ff490670-64c1-4fb8-990e-a02aa44ac010 on zookeeper-1:31010]"
> This is despite not even including the new columns in the query.
> Expected result would be to treat the non existing columns in certain files as either null or default value and allow them to be grouped and sorted
> Example
> SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02' work with changing schema
> but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02'  group by dir0 does not work
> For us this hampers any possibility to have an evolving schema with moderatly complex queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)