You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Yichuan (William) Hu" <hu...@gmail.com> on 2011/07/01 20:16:49 UTC

Loaded data all NULL

Hi,

I am doing some simple tests to create table, load data using Hive. I
am working on the VM provided by cloudera
(https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).

I have a text file with each line containing an IP address and a name, e.g.,

123.45.67.89 tom
123.45.67.92 mark

I create a table using following command:

CREATE TABLE ip_name(
ip STRING,
name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
"input.regex" = "^([\d.]+) ([a-z]+)",
"output.format.string" = "%1$s %2$s"
)
STORED AS TEXTFILE;

Then, I use the following command to load data into the table:

LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE ip_name;

Table was successfully created and file was also loaded, but all are
NULL (the number of rows in the table is the same as the number of
rows in the file). What could be the problem?

Thanks a lot!

William

Re: Loaded data all NULL

Posted by "Yichuan (William) Hu" <hu...@gmail.com>.
Thanks, Edward, I am just using this simple example to test regex
serde. Ultimately I would like to use regex to parse various types of
log files.

On Fri, Jul 1, 2011 at 2:34 PM, Edward Capriolo <ed...@gmail.com> wrote:
>
>
> On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu <hu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am doing some simple tests to create table, load data using Hive. I
>> am working on the VM provided by cloudera
>> (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).
>>
>> I have a text file with each line containing an IP address and a name,
>> e.g.,
>>
>> 123.45.67.89 tom
>> 123.45.67.92 mark
>>
>> I create a table using following command:
>>
>> CREATE TABLE ip_name(
>> ip STRING,
>> name STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES(
>> "input.regex" = "^([\d.]+) ([a-z]+)",
>> "output.format.string" = "%1$s %2$s"
>> )
>> STORED AS TEXTFILE;
>>
>> Then, I use the following command to load data into the table:
>>
>> LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
>> ip_name;
>>
>> Table was successfully created and file was also loaded, but all are
>> NULL (the number of rows in the table is the same as the number of
>> rows in the file). What could be the problem?
>>
>> Thanks a lot!
>>
>> William
>
> You do not need the regex serde for this. Specify the table normally and use
> space as the delimiter.
>
> CREATE EXTERNAL TABLE logdata(
>      xxx STRING,
>      yyy STRING,
>      ...
>
>      z_t)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\040'
> STORED AS TEXTFILE;
>
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html
>

Re: Loaded data all NULL

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu <hu...@gmail.com>wrote:

> Hi,
>
> I am doing some simple tests to create table, load data using Hive. I
> am working on the VM provided by cloudera
> (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).
>
> I have a text file with each line containing an IP address and a name,
> e.g.,
>
> 123.45.67.89 tom
> 123.45.67.92 mark
>
> I create a table using following command:
>
> CREATE TABLE ip_name(
> ip STRING,
> name STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES(
> "input.regex" = "^([\d.]+) ([a-z]+)",
> "output.format.string" = "%1$s %2$s"
> )
> STORED AS TEXTFILE;
>
> Then, I use the following command to load data into the table:
>
> LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
> ip_name;
>
> Table was successfully created and file was also loaded, but all are
> NULL (the number of rows in the table is the same as the number of
> rows in the file). What could be the problem?
>
> Thanks a lot!
>
> William
>

You do not need the regex serde for this. Specify the table normally and use
space as the delimiter.

CREATE EXTERNAL TABLE logdata(
     xxx STRING,
     yyy STRING,
     ...
     z_t)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\040'
STORED AS TEXTFILE;

http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html

Re: Loaded data all NULL

Posted by "Yichuan (William) Hu" <hu...@gmail.com>.
I tried \\d instead of \d, problem solved. Thanks, Neerja!

On Fri, Jul 1, 2011 at 2:28 PM, Neerja Bhatnagar <bn...@gmail.com> wrote:
> Hi,
> When I was having this issue, it was due to 2 problems - 1. the slashes in
> the regex need to be escaped Java-style so \d would be \\d and my regex's
> were wrong.
> Hope this helps!
> Cheers, Neerja
>
> On Fri, Jul 1, 2011 at 11:16 AM, Yichuan (William) Hu <hu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am doing some simple tests to create table, load data using Hive. I
>> am working on the VM provided by cloudera
>> (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).
>>
>> I have a text file with each line containing an IP address and a name,
>> e.g.,
>>
>> 123.45.67.89 tom
>> 123.45.67.92 mark
>>
>> I create a table using following command:
>>
>> CREATE TABLE ip_name(
>> ip STRING,
>> name STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES(
>> "input.regex" = "^([\d.]+) ([a-z]+)",
>> "output.format.string" = "%1$s %2$s"
>> )
>> STORED AS TEXTFILE;
>>
>> Then, I use the following command to load data into the table:
>>
>> LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
>> ip_name;
>>
>> Table was successfully created and file was also loaded, but all are
>> NULL (the number of rows in the table is the same as the number of
>> rows in the file). What could be the problem?
>>
>> Thanks a lot!
>>
>> William
>
>
>
> --
> Thank you! Neerja
>
>