You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jay Ramadorai <jr...@tripadvisor.com> on 2011/01/27 20:27:11 UTC

Newlines in data

How can we preserve newlines in data in Hive columns? Currently newlines in data terminate the Hive record.

I am using Sqoop to import data into Hive from external databases. Even using the the Escaped By clause of CREATE TABLE, Hive does not escape newlines. I see there is a JIRA for this https://issues.apache.org/jira/browse/HIVE-1898 . I'm wondering how others deal with preserving newlines? Is there any way short of replacing all newlines before importing? Is there a custom input format class that can use a different line terminator?

Re: Newlines in data

Posted by Viral Bajaria <vi...@gmail.com>.
Well even I learned the hard way that Hive does not obey escaping (atleast
not in it's current iteration) --> correct me if I am wrong.

I currently get rid of the column separators (\t) and line separators (\n)
in our table definitions for now. When i was using hive 0.3 it used to allow
\r\n as the line separator, after upgrading to hive 0.5 it no longer allowed
that and i had to move to \n, but for some reason i still have to get rid of
\r in my text data columns to avoid any data corruption.

In short, yeah you should replace column-separators and \n and \r in your
data before loading into hive.

On Thu, Jan 27, 2011 at 11:27 AM, Jay Ramadorai
<jr...@tripadvisor.com>wrote:

> How can we preserve newlines in data in Hive columns? Currently newlines in
> data terminate the Hive record.
>
> I am using Sqoop to import data into Hive from external databases. Even
> using the the Escaped By clause of CREATE TABLE, Hive does not escape
> newlines. I see there is a JIRA for this
> https://issues.apache.org/jira/browse/HIVE-1898 . I'm wondering how others
> deal with preserving newlines? Is there any way short of replacing all
> newlines before importing? Is there a custom input format class that can use
> a different line terminator?
>