You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by hbaseuser hbaseuser <hb...@gmail.com> on 2014/05/22 00:48:13 UTC

JSON Data with "\n" character..

Hi,

I'm trying to process JSON data in hive (0.12) with "\n" inside some of the
keys & values. It is messed up and I have no control over changing the
input.

What is the best way to process this data in hdfs?

Thanks!

Re: JSON Data with "\n" character..

Posted by Andrew Mains <an...@kontagent.com>.
Hi,

We've run into this issue as well, and it is indeed annoying. As I 
recall, the issue comes in not when the records are read off disk but 
when hive deals with the records further down the line (I forget exactly 
where).

I believe this issue is relevant: 
https://issues.apache.org/jira/browse/HIVE-1898 . If you can't 
preprocess the input to clean it up, the suggestion there of using

regexp_replace(<my_column>, "\n", "")

might be useful.

Our (rather clunky) workaround was to do the replacement in our SerDe 
(we were already using a custom SerDe, so this wasn't a huge burden for 
us).

What does your CREATE TABLE statement look like?

Andrew

On 5/21/14, 3:48 PM, hbaseuser hbaseuser wrote:
> Hi,
>
> I'm trying to process JSON data in hive (0.12) with "\n" inside some 
> of the keys & values. It is messed up and I have no control over 
> changing the input.
>
> What is the best way to process this data in hdfs?
>
> Thanks!