You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2016/12/08 18:04:58 UTC
[jira] [Comment Edited] (DRILL-4996) Parquet Date auto-correction
is not working in auto-partitioned parquet files generated by drill-1.6
[ https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645729#comment-15645729 ]
Vitalii Diravka edited comment on DRILL-4996 at 12/8/16 6:04 PM:
-----------------------------------------------------------------
Right. Moreover drill-1.6.0 (which generated that file) will show incorrect date values too.
Cause before DRILL-4203 fix drill could not read any correct date values in parquet files.
To see are date values correct, you can use parquet tools.
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar meta /home/vitalii/Downloads/1.6/0_0_1.parquet
file: file:/home/vitalii/Downloads/1.6/0_0_1.parquet
creator: parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)
extra: drill.version = 1.6.0
file schema: root
--------------------------------------------------------------------------------
i_rec_start_date: OPTIONAL INT32 O:DATE R:0 D:1
i_rec_end_date: OPTIONAL INT32 O:DATE R:0 D:1
...
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /home/vitalii/Downloads/1.6/0_0_1.parquet
....
i_rec_start_date = 10161
i_rec_end_date = 10891
.....
{code}
Incorrect values more than 4881176.
was (Author: vitalii):
Right. Moreover drill-1.6.0 (which generated that file) will show incorrect date values too.
Cause fefore DRILL-4203 fix drill could not read any correct date values in parquet files.
To see are date values correct, you can use parquet tools.
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar meta /home/vitalii/Downloads/1.6/0_0_1.parquet
file: file:/home/vitalii/Downloads/1.6/0_0_1.parquet
creator: parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)
extra: drill.version = 1.6.0
file schema: root
--------------------------------------------------------------------------------
i_rec_start_date: OPTIONAL INT32 O:DATE R:0 D:1
i_rec_end_date: OPTIONAL INT32 O:DATE R:0 D:1
...
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /home/vitalii/Downloads/1.6/0_0_1.parquet
....
i_rec_start_date = 10161
i_rec_end_date = 10891
.....
{code}
Incorrect values more on 4881176.
> Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6
> ----------------------------------------------------------------------------------------------------
>
> Key: DRILL-4996
> URL: https://issues.apache.org/jira/browse/DRILL-4996
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Rahul Challapalli
> Assignee: Vitalii Diravka
> Priority: Critical
> Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh` group by i_rec_start_date, i_size;
> +-------------------+--------------+
> | i_rec_start_date | i_size |
> +-------------------+--------------+
> | null | large |
> | 366-11-08 | extra large |
> | 366-11-08 | medium |
> | null | medium |
> | 366-11-08 | petite |
> | 364-11-07 | medium |
> | null | petite |
> | 365-11-07 | medium |
> | 368-11-07 | economy |
> | 365-11-07 | large |
> | 365-11-07 | small |
> | 366-11-08 | small |
> | 365-11-07 | extra large |
> | 364-11-07 | N/A |
> | 366-11-08 | economy |
> | 366-11-08 | large |
> | 364-11-07 | small |
> | null | small |
> | 364-11-07 | large |
> | 364-11-07 | extra large |
> | 368-11-07 | N/A |
> | 368-11-07 | extra large |
> | 368-11-07 | large |
> | 365-11-07 | petite |
> | null | N/A |
> | 365-11-07 | economy |
> | 364-11-07 | economy |
> | 364-11-07 | petite |
> | 365-11-07 | N/A |
> | 368-11-07 | medium |
> | null | extra large |
> | 368-11-07 | small |
> | 368-11-07 | petite |
> | 366-11-08 | N/A |
> +-------------------+--------------+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2 and then the above query returned the right results.
> I attached the required data sets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)