You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Stefán Baxter (JIRA)" <ji...@apache.org> on 2015/11/10 21:15:11 UTC
[jira] [Comment Edited] (DRILL-4056) Avro deserialization corrupts
data
[ https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999231#comment-14999231 ]
Stefán Baxter edited comment on DRILL-4056 at 11/10/15 8:14 PM:
----------------------------------------------------------------
This is a demo file.
The bug can be triggered by selecting:
select s.classification.type from dfs.asa.`/<filename>` as s;
lines 41 and 42 are wrong "musikr" it should be "musik" but it seems to get the extra r from the preavious entry "teater".
The sample can be a bit misleading as it contains music in two languages, musik and musikk, so please don't let that "extra" k not throw you off track.
was (Author: acmeguy):
This is a demo file.
The bug can be triggered by selecting:
select s.classification.type from dfs.asa.`/<filename>` as s;
line 41 and 42 are wrong "musikr" it should be "musik" but it seems to get the extra r from the preavious entry "teater".
The sample can be a bit misleading as it contains music in two languages, musik and musikk, so please don't let that "extra" k not throw you off track.
> Avro deserialization corrupts data
> ----------------------------------
>
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Other
> Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
> Reporter: Stefán Baxter
> Fix For: 1.3.0
>
> Attachments: test.zip
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
> When I select 10 rows from this file I get:
> +---------------------+
> | EXPR$0 |
> +---------------------+
> | Gæst |
> | Voksen |
> | Voksen |
> | Invitation KIF KBH |
> | Invitation KIF KBH |
> | Ordinarie pris KBH |
> | Ordinarie pris KBH |
> | Biljetter 200 krBH |
> | Biljetter 200 krBH |
> | Biljetter 200 krBH |
> +---------------------+
> The bug is that the field values are incorrectly de-serialized and the value from the previous row is retained if the subsequent row is shorter.
> The sql query:
> "select s.classification.variant variant from dfs.<some> as s limit 10;"
> That way the "Ordinarie pris" becomes "Ordinarie pris KBH" because the previous row had the value "Invitation KIF KBH".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)