You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2017/05/04 17:04:04 UTC

[jira] [Comment Edited] (DRILL-5329) External sort does not support "obscure" data types

    [ https://issues.apache.org/jira/browse/DRILL-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997031#comment-15997031 ] 

Rahul Challapalli edited comment on DRILL-5329 at 5/4/17 5:03 PM:
------------------------------------------------------------------

[~Paul.Rogers] Among the types which do not work, what types do you plan to fix? There are parquet writers which generate parquet files with some of these types. One way would be to say that we support any types that can be generated by hive & spark and work towards that goal. Thoughts?


was (Author: rkins):
[~Paul.Rogers] Among the types which do not work, what types do you plan to intend? There are parquet writers which generate parquet files with some of these types. One way would be to say that we support any types that can be generated by hive & spark and work towards that goal. Thoughts?

> External sort does not support "obscure" data types
> ---------------------------------------------------
>
>                 Key: DRILL-5329
>                 URL: https://issues.apache.org/jira/browse/DRILL-5329
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> A unit test was created to exercise the "Sorter" mechanism within the External Sort, which is used to sort each incoming batch. The sorter was tested with each Drill data type.
> The following types fail:
> * TINYINT
> * UINT1
> * SMALLINT
> * UINT2
> * UINT4
> * UINT8
> * VAR16CHAR
> * DECIMAL28SPARSE
> * DECIMAL38SPARSE
> The types that work include:
> * INT
> * BIGINT
> * FLOAT4
> * FLOAT8
> * DECIMAL9
> * DECIMAL18
> * VARCHAR
> * VARBINARY
> * DATE
> * TIME
> * TIMESTAMP
> * INTERVAL
> * INTERVALDAY
> * INTERVALYEAR
> Could not find a way to test the following:
> * DECIMAL28DENSE
> * DECIMAL38DENSE
> * LIST
> * MAP
> * GENERIC_OBJECT
> * UNION
> Not yet supported in Drill:
> * MONEY
> * FIXEDCHAR
> * FIXED16CHAR
> * FIXEDBINARY
> * NULL
> * TIMETZ
> * TIMESTAMPTZ
> * LATE
> The failure manifests on one of two ways:
> * If dynamic UDFs are enabled, the query crashes with an NPE. (See DRILL-5331.)
> * If dynamic UDFs are disabled, the generated code silently skips the comparison step, resulting in the sort not actually being done:
> Sorting a set of 20-pseudo-random rows produces the following output:
> {code}
> #, row #, key, value
> 0(0): 11, "0"
> 1(1): 14, "1"
> 2(2): 17, "2"
> 3(3): 0, "3"
> {code}
> By contrast, the (working) Int type produces the correct results:
> {code}
> #, row #, key, value
> 0(3): 0, "3"
> 1(10): 1, "10"
> 2(17): 2, "17"
> 3(4): 3, "4"
> {code}
> The first number is the row index, the second is the row pointed to by the sv2 (which should be written to create sort order). Sort was done ASC, NULLS_HIGH, by the key field.
> A strong concern here is that there is no error or other warning to the user that Drill cannot sort this type; Drill just silently declines to perform the operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)