You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/03/07 02:21:38 UTC

[jira] [Commented] (DRILL-2267) Parquet writer with dictionary encoding results in corrupted varchar columns

    [ https://issues.apache.org/jira/browse/DRILL-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351279#comment-14351279 ] 

Steven Phillips commented on DRILL-2267:
----------------------------------------

+1
lgtm

> Parquet writer with dictionary encoding results in corrupted varchar columns
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-2267
>                 URL: https://issues.apache.org/jira/browse/DRILL-2267
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Steven Phillips
>             Fix For: 0.8.0
>
>         Attachments: 0_0_0.parquet, DRILL-2267.1.patch.txt
>
>
> Using CTAS created a parquet file through drill having the varchar datatype.
> Created parquet file looks like this through parquet-tools 
> VARCHAR_col:         OPTIONAL BINARY O:UTF8 R:0 D:1
> VAR16CHAR_col:       OPTIONAL BINARY O:UTF8 R:0 D:1
> VARCHAR_col:          BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> VAR16CHAR_col:        BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> On querying the file several records show up having corrupted data for these fields.
> | VAR16CHAR_col |
> +---------------+
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> If dictionary encoding is turned off the resultant file can be read without these issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)