You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aihua Xu (JIRA)" <ji...@apache.org> on 2016/01/13 16:50:39 UTC

[jira] [Updated] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe

     [ https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aihua Xu updated HIVE-11785:
----------------------------
    Hadoop Flags: Incompatible change
    Release Note: This change disallows carriage return and new line characters to be used as field separators or escape character. While before this change, those were allowed while those cases could easily lead to incorrect results if the content also contain carriage return or new line. Since even carriage return or new line was escaped, line based input format in MapReduce used in Hive will break the lines by carriage return and new line only and lead to incorrect result.

> Support escaping carriage return and new line for LazySimpleSerDe
> -----------------------------------------------------------------
>
>                 Key: HIVE-11785
>                 URL: https://issues.apache.org/jira/browse/HIVE-11785
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>              Labels: TODOC2.0
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, HIVE-11785.patch, test.parquet
>
>
> Create the table and perform the queries as follows. You will see different results when the setting changes. 
> The expected result should be:
> {noformat}
> 1	newline
> here
> 2	carriage return
> 3	both
> here
> {noformat}
> {noformat}
> hive> create table repo (lvalue int, charstring string) stored as parquet;
> OK
> Time taken: 0.34 seconds
> hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
> Loading data to table default.repo
> chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive
> Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0]
> OK
> Time taken: 0.732 seconds
> hive> set hive.fetch.task.conversion=more;
> hive> select * from repo;
> OK
> 1	newline
> here
> here	carriage return
> 3	both
> here
> Time taken: 0.253 seconds, Fetched: 3 row(s)
> hive> set hive.fetch.task.conversion=none;
> hive> select * from repo;
> Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1441752031022_0006, Tracking URL = http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
> Kill Command = /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job  -kill job_1441752031022_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
> 2015-09-09 11:35:54,127 Stage-1 map = 0%,  reduce = 0%
> 2015-09-09 11:36:04,664 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 sec
> MapReduce Total cumulative CPU time: 2 seconds 980 msec
> Ended Job = job_1441752031022_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 4251 HDFS Write: 51 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 980 msec
> OK
> 1	newline
> NULL	NULL
> 2	carriage return
> NULL	NULL
> 3	both
> NULL	NULL
> Time taken: 25.131 seconds, Fetched: 6 row(s)
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)