You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/07/03 18:06:00 UTC
[jira] [Commented] (DRILL-5239) Drill text reader reports wrong
results when column value starts with '#'
[ https://issues.apache.org/jira/browse/DRILL-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072784#comment-16072784 ]
Paul Rogers commented on DRILL-5239:
------------------------------------
Note that the '#' symbol is used in some formats to indicate a comment:
{code}
# Exported from server abcd at 2017:07:01T01:00:00
# Server log version 2.3
time,recv-ip,bytes,status,...
<data>
{code}
Since some formats allow this, a solution might be to add an option (normally off) that permits comments in headers. If the rule is off, then '#' is just another character. If it is on, then we skip comment lines until we find a header. In neither case do we need to allow comment lines in the data section.
There is probably a write-up of this somewhere for a format that allows columns. Perhaps we can track that down as a reference. (I saw the format in conjunction with web logs a few jobs back...)
> Drill text reader reports wrong results when column value starts with '#'
> -------------------------------------------------------------------------
>
> Key: DRILL-5239
> URL: https://issues.apache.org/jira/browse/DRILL-5239
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Reporter: Rahul Challapalli
> Assignee: Roman
> Priority: Blocker
>
> git.commit.id.abbrev=2af709f
> Data Set :
> {code}
> D|32
> 8h|234
> ;#|3489
> ^$*(|308
> #|98
> {code}
> Wrong Result : (Last row is missing)
> {code}
> select columns[0] as col1, columns[1] as col2 from dfs.`/drill/testdata/wtf2.tbl`;
> +-------+-------+
> | col1 | col2 |
> +-------+-------+
> | D | 32 |
> | 8h | 234 |
> | ;# | 3489 |
> | ^$*( | 308 |
> +-------+-------+
> 4 rows selected (0.233 seconds)
> {code}
> The issue does not however happen with a parquet file
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)