You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Andre Araujo <ar...@pythian.com> on 2014/01/07 02:17:08 UTC

RCFILE and "\n" characters

The example below shows that the RCFILE SerDe doesn't handle "\n" in string
fields correctly.

It seem that the SerDe uses "\n" internally as a record delimiter but it's
failing to de/serialize it correctly when it appears within a field. Is
that correct?

Any ideas on how to work around that?

Thanks,
Andre


$ echo X > dual.data
$ hive

hive> CREATE  TABLE araujo_sandbox.dual(dummy string) stored as textfile;
OK

hive> use araujo_sandbox;
OK

hive> LOAD DATA LOCAL INPATH 'dual.data' INTO TABLE araujo_sandbox.dual;
...
OK

hive> select * from dual;
OK
X

hive> CREATE  TABLE araujo_sandbox.testIssue(
    >   id int,
    >   first_name string,
    >   last_name string
    > )
    > ROW FORMAT SERDE
    >   'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
    > STORED AS INPUTFORMAT
    >   'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
    > OUTPUTFORMAT
    >   'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
    > ;
OK

hive> insert into table araujo_sandbox.testIssue
    > select 1, 'John\n', 'Doe' from dual;
...
1 Rows loaded to testissue
...
OK

hive> select id from testIssue;
...
OK
1
Time taken: 4.475 seconds, Fetched: 1 row(s)

hive> select first_name from testIssue;
...
OK
John
                                             <---- there's an empty row
here!!
Time taken: 4.44 seconds, Fetched: 2 row(s)  <---- there should be only 1
row

hive> select last_name from testIssue;
...
OK
Doe
Time taken: 4.414 seconds, Fetched: 1 row(s)

hive> select * from testIssue;
OK
1       John
        Doe
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive>


-- 
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)

-- 


--