You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Dave Oshinsky (JIRA)" <ji...@apache.org> on 2016/06/02 15:05:59 UTC
[jira] [Created] (DRILL-4704) select statement behavior is
inconsistent for decimal values in parquet
Dave Oshinsky created DRILL-4704:
------------------------------------
Summary: select statement behavior is inconsistent for decimal values in parquet
Key: DRILL-4704
URL: https://issues.apache.org/jira/browse/DRILL-4704
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Affects Versions: 1.6.0
Environment: Windows 7 Pro, Java 1.8.0_91
Reporter: Dave Oshinsky
Fix For: 1.7.0
A select statement that searches a parquet file for a decimal value matching a specific value behaves inconsistently. The query expressed most simply finds nothing:
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100;
+--------------+-------------+------------+--------+---------------+-----------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIRE_DATE |
+--------------+-------------+------------+--------+---------------+-----------+
+--------------+-------------+------------+--------+---------------+-----------+
No rows selected (0.348 seconds)
The query can be modified to find the matching row in a few ways, such as the following (using between instead of '=', changing 100 to 100.0, or casting as decimal:
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id between 100 and 100;
+--------------+-------------+------------+--------+---------------+-----------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
+--------------+-------------+------------+--------+---------------+-----------+
| 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
+--------------+-------------+------------+--------+---------------+-----------+
1 row selected (0.226 seconds)
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100.0;
+--------------+-------------+------------+--------+---------------+-----------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
+--------------+-------------+------------+--------+---------------+-----------+
| 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
+--------------+-------------+------------+--------+---------------+-----------+
1 row selected (0.259 seconds)
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id AS DECIMAL) = 100;
+--------------+-------------+------------+--------+---------------+-----------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIR |
+--------------+-------------+------------+--------+---------------+-----------+
| 100 | Steven | King | SKING | 515.123.4567 | 2003-06-1 |
+--------------+-------------+------------+--------+---------------+-----------+
1 row selected (0.232 seconds)
0: jdbc:drill:zk=local>
The schema of the parquet data that is being searched is as follows:
$ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
.....
file schema: HR.EMPLOYEES
--------------------------------------------------------------------------------
EMPLOYEE_ID: REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1
LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0
EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0
PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1
HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0
JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0
SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
row group 1: RC:107 TS:9943 OFFSET:4
--------------------------------------------------------------------------------
EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
PHONE_NUMBER: BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
HIRE_DATE: BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
JOB_ID: BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
SALARY: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
COMMISSION_PCT: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
MANAGER_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
DEPARTMENT_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)