You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by XieXianshan <xi...@cn.fujitsu.com> on 2011/08/19 04:44:14 UTC

How to skip the malformatted records while loading data

Hi,everyone,

Is there an option to ignore malformatted records while loading data
into hive table?
Or an option to ignore bad rows while querying data?

For instance:
1. Specify a row format explicitly for a new table.
hive>create table tb (id int, pref string, zip string) row format
delimited fields terminated by ',' lines terminated by '\n';

2. Load data into the table from a csv file that with bad records.
hive>load data local inpath 'data.csv' overwrite into table tb;

The data.csv might look like:
32,aaa,4200002
<--Blank line
33:bbb:4200003 <--Invalid field delimiter ":"
aa,ccc,4200004 <--Non-int number "aa"

3. Select data
hive> select * from tb;
OK
32 aaa 4200002
NULL NULL NULL
NULL NULL NULL
NULL ccc 4200004
Time taken: 0.196 seconds

I have tried to set mapred.skip.map.max.skip.records,but it seems not to
work.

Thanks in advance.

Regards,
Xie

-- 
Best Regards
Xie Xianshan
--------------------------------------------------
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xiexs@cn.fujitsu.com
--------------------------------------------------
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed