You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Zheng Shao <zs...@gmail.com> on 2009/12/01 04:50:36 UTC

Re: FW: Table created using RegexSerDe doesn't work when made external

Yes Ken. Please try http://www.fileformat.info/tool/regex.htm to test
your regex to see if it can match your data or not.


Zheng

On Mon, Nov 30, 2009 at 6:11 PM,  <Ke...@wellsfargo.com> wrote:
> I looked in hive.log while doing the CREATE TABLE and found:
>
> 2009-11-30 17:51:53,240 WARN  serde2.RegexSerDe
> (RegexSerDe.java:deserialize(179)) - 1 unmatched rows are found: Nov  2
> 23:00:04 sf45-cc2 sendmail[10905]: [ID 801593 mail.info] nA3704qO010905:
> from=nagios, size=286, class=0, nrcpts=1,
> msgid=<20...@sf45-cc2.wellsfargo.com>,
> relay=nagios@localhost
>
> So instead of grabbing a log file at random, I went back and used exactly
> the same one I did when I was testing the Serde.
> This time, the SELECT statement worked fine.
>
> Maybe I’ll run my regex over a large sampling of our syslog files and see
> how many don’t conform to it. I may have to change my regex to be more
> forgiving of deviations…
> _____________________________________________
> From: Barclay, Ken
> Sent: Monday, November 30, 2009 5:07 PM
> To: hive-user@hadoop.apache.org
> Subject: Table created using RegexSerDe doesn't work when made external
>
>
> Hello,
>
> I’m getting only NULLs when trying to read data from an external table for
> which the data was staged. I’m using Cloudera’s hive-0.4.0+14.tar.gz with
> hadoop-0.20.1+152.tar.gz on a Centos machine.
>
> My steps were:
>
> # CREATE TABLE
>
> CREATE EXTERNAL TABLE mailog_stg(month STRING, day STRING, time STRING, host
> STRING, logline STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>   "input.regex" = "^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (.+)$",
>   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
> )
> STORED AS TEXTFILE
> LOCATION '/user/data/staging/mailog';
>
> # COPY DATA FILE
>
> hadoop dfs -put /ken/sf45-cc2-maillogs/mail.20091102
> /user/data/staging/mailog
>
> # DO A SELECT
>
> hive> select * from mailog_stg;
> OK
> NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL
>
> I verified my data is out there:
>
> [root@jims-desktop ~]#  hadoop dfs -cat /user/data/staging/mailog/m*|wc
>       4      77    1016
>
> And in fact when I used LOAD DATA INPATH with a table defined the same way
> but without making it external, it worked fine.
>
> Any suggestions how to debug this?
>
> Thanks
> Ken
>
>



-- 
Yours,
Zheng