You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jack Yang <ji...@uow.edu.au> on 2014/05/18 06:40:54 UTC
problem with delimiters (control A)
Hi All,
I have a local file called mytest.txt (restored in hdfs already). The content is like this:
$ cat -A HDFSLOAD_DIR/mytest.txt
49139801^A25752451^Aunknown$
49139801^A24751754^Aunknown$
49139801^A2161696^Anice$
To load this raw data above, I then defined the table like this in HQL:
create table my_test(
userid BIGINT,
movieId BIGINT,
comment STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE;
My problem is that when I "SELECT * FROM my_test;" , I got this:
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
I then replace "^A" with "^" in mytest.txt, and also re-defined my table structure by using :
FIELDS TERMINATED BY '^'
So when I select all, I got the correct results.
Any thoughts??? Thanks in advance.
Best regards,
Jack
Re: problem with delimiters (control A)
Posted by Sanjay Subramanian <sa...@yahoo.com>.
Hi Jack
Since u already have your data with columns separated by CtrlA then u need to define the HIVE table as follows
(by default Hive will assume CtrlA as column delimiter)
create table if not exists my_test(
userid BIGINT,
movieId BIGINT,
comment STRING
);
regards
sanjay
From: Petter von Dolwitz (Hem) <pe...@gmail.com>
To: "user@hive.apache.org" <us...@hive.apache.org>
Sent: Thursday, May 22, 2014 8:00 AM
Subject: Re: problem with delimiters (control A)
Hi Jack,
could it be that your source file does not use \001 as delimiter but the character sequence "^A"? I get this feeling when looking at your cat printout and also when you say you can do a search and replace. If I do a print out of a file with the same delimiter I get the following
2014-04-097846478510
2014-04-107851178558
These are actually three columns but you cannot see the ^A because it is not a printable character. Check your file in a hex editor or something to verify that you have the correct delimiter.
Br,
Petter
2014-05-18 6:40 GMT+02:00 Jack Yang <ji...@uow.edu.au>:
Hi All,
>I have a local file called mytest.txt (restored in hdfs already). The content is like this:
>$ cat -A HDFSLOAD_DIR/mytest.txt
>49139801^A25752451^Aunknown$
>49139801^A24751754^Aunknown$
>49139801^A2161696^Anice$
>
>To load this raw data above, I then defined the table like this in HQL:
>
>create table my_test(
>userid BIGINT,
>movieId BIGINT,
>comment STRING
>)
>ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '\001'
>STORED AS TEXTFILE;
>
>My problem is that when I “SELECT * FROM my_test;” , I got this:
>
>NULL NULL NULL
>NULL NULL NULL
>NULL NULL NULL
>
>I then replace “^A” with “^” in mytest.txt, and also re-defined my table structure by using :
>FIELDS TERMINATED BY '^’
>
>So when I select all, I got the correct results.
>
>Any thoughts??? Thanks in advance.
>
>
>Best regards,
>Jack
>
>
Re: problem with delimiters (control A)
Posted by "Petter von Dolwitz (Hem)" <pe...@gmail.com>.
Hi Jack,
could it be that your source file does not use \001 as delimiter but the
character sequence "^A"? I get this feeling when looking at your cat
printout and also when you say you can do a search and replace. If I do a
print out of a file with the same delimiter I get the following
2014-04-097846478510
2014-04-107851178558
These are actually three columns but you cannot see the ^A because it is
not a printable character. Check your file in a hex editor or something to
verify that you have the correct delimiter.
Br,
Petter
2014-05-18 6:40 GMT+02:00 Jack Yang <ji...@uow.edu.au>:
> Hi All,
>
> I have a local file called mytest.txt (restored in hdfs already). The
> content is like this:
>
> $ cat -A HDFSLOAD_DIR/mytest.txt
>
> 49139801^A25752451^Aunknown$
>
> 49139801^A24751754^Aunknown$
>
> 49139801^A2161696^Anice$
>
>
>
> To load this raw data above, I then defined the table like this in HQL:
>
>
>
> create table my_test(
>
> userid BIGINT,
>
> movieId BIGINT,
>
> comment STRING
>
> )
>
> ROW FORMAT DELIMITED
>
> FIELDS TERMINATED BY '\001'
>
> STORED AS TEXTFILE;
>
>
>
> My problem is that when I “SELECT * FROM my_test;” , I got this:
>
>
>
> NULL NULL NULL
>
> NULL NULL NULL
>
> NULL NULL NULL
>
>
>
> I then replace “^A” with “^” in mytest.txt, and also re-defined my table
> structure by using :
>
> FIELDS TERMINATED BY '^’
>
>
>
> So when I select all, I got the correct results.
>
>
>
> Any thoughts??? Thanks in advance.
>
>
>
>
>
> Best regards,
>
> Jack
>
>
>
>
>