You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ke...@wellsfargo.com on 2009/12/01 02:07:13 UTC

Table created using RegexSerDe doesn't work when made external

Hello,

I'm getting only NULLs when trying to read data from an external table for which the data was staged. I'm using Cloudera's hive-0.4.0+14.tar.gz with hadoop-0.20.1+152.tar.gz on a Centos machine.

My steps were:

# CREATE TABLE

CREATE EXTERNAL TABLE mailog_stg(month STRING, day STRING, time STRING, host STRING, logline STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (.+)$",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE
LOCATION '/user/data/staging/mailog';

# COPY DATA FILE

hadoop dfs -put /ken/sf45-cc2-maillogs/mail.20091102  /user/data/staging/mailog

# DO A SELECT

hive> select * from mailog_stg;
OK
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL

I verified my data is out there:

[root@jims-desktop ~]#  hadoop dfs -cat /user/data/staging/mailog/m*|wc
      4      77    1016

And in fact when I used LOAD DATA INPATH with a table defined the same way but without making it external, it worked fine.

Any suggestions how to debug this?

Thanks
Ken