You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/03/07 02:21:38 UTC
[jira] [Commented] (DRILL-2267) Parquet writer with dictionary
encoding results in corrupted varchar columns
[ https://issues.apache.org/jira/browse/DRILL-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351279#comment-14351279 ]
Steven Phillips commented on DRILL-2267:
----------------------------------------
+1
lgtm
> Parquet writer with dictionary encoding results in corrupted varchar columns
> ----------------------------------------------------------------------------
>
> Key: DRILL-2267
> URL: https://issues.apache.org/jira/browse/DRILL-2267
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Ramana Inukonda Nagaraj
> Assignee: Steven Phillips
> Fix For: 0.8.0
>
> Attachments: 0_0_0.parquet, DRILL-2267.1.patch.txt
>
>
> Using CTAS created a parquet file through drill having the varchar datatype.
> Created parquet file looks like this through parquet-tools
> VARCHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
> VAR16CHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
> VARCHAR_col: BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> VAR16CHAR_col: BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> On querying the file several records show up having corrupted data for these fields.
> | VAR16CHAR_col |
> +---------------+
> | ������������ |
> | |
> | �������� |
> | ����� |
> | �� |
> | |
> | |
> | �� |
> | ������������ |
> | |
> | �������� |
> | ����� |
> | �� |
> | |
> | |
> | �� |
> | ������������ |
> | |
> | �������� |
> | ����� |
> | �� |
> | |
> | |
> | �� |
> | ������������ |
> | |
> | �������� |
> | ����� |
> | �� |
> | |
> | |
> | �� |
> | ������������ |
> | |
> | �������� |
> | ����� |
> | �� |
> If dictionary encoding is turned off the resultant file can be read without these issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)