You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Dhandapani, Karthik" <Ka...@CVSCaremark.com> on 2015/03/27 18:00:45 UTC

Issue with a new line character in the data

Hi,

I have an scenario where new line character exists in data. Because of new line character, number of records in Target is more than in source. Every record that has new line character in the data is broken and it appears as 2 records in hive. When I use cat and pipe it to wc -l, I am getting right counts, but when I use hadoop streaming to get the counts from HDFS files, I am getting more records because of the issue with new line character. Also in Hive External table, when I query the counts of records, it is more and the record is split has 2 records from the new line position. Is there an workaround in Sqoop/Hive to handle this scenario, so hive can ignore new line character if it is part of the data.

We are in HDP 2.1 with sqoop 1.4.4 and hive 0.13 version.

Appreciate your help with this.

Thanks,
Karthik