You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2017/03/28 20:39:41 UTC
[jira] [Closed] (DRILL-4704) select statement behavior is
inconsistent for decimal values in parquet
[ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Challapalli closed DRILL-4704.
------------------------------------
Verified and added tests
> select statement behavior is inconsistent for decimal values in parquet
> -----------------------------------------------------------------------
>
> Key: DRILL-4704
> URL: https://issues.apache.org/jira/browse/DRILL-4704
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.6.0
> Environment: Windows 7 Pro, Java 1.8.0_91
> Reporter: Dave Oshinsky
> Fix For: 1.8.0
>
>
> A select statement that searches a parquet file for a decimal value matching a specific value behaves inconsistently. The query expressed most simply finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIRE_DATE |
> +--------------+-------------+------------+--------+---------------+-----------+
> +--------------+-------------+------------+--------+---------------+-----------+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the following (using between instead of '=', changing 100 to 100.0, or casting as decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id between 100 and 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100.0;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id AS DECIMAL) = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .....
> file schema: HR.EMPLOYEES
> --------------------------------------------------------------------------------
> EMPLOYEE_ID: REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1: RC:107 TS:9943 OFFSET:4
> --------------------------------------------------------------------------------
> EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
> EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER: BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE: BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> JOB_ID: BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> SALARY: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> COMMISSION_PCT: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> MANAGER_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> DEPARTMENT_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)