You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jack Yang <ji...@uow.edu.au> on 2014/05/18 06:40:54 UTC

problem with delimiters (control A)

Hi All,
I have a local file called mytest.txt (restored in hdfs already). The content is like this:
$ cat -A  HDFSLOAD_DIR/mytest.txt
49139801^A25752451^Aunknown$
49139801^A24751754^Aunknown$
49139801^A2161696^Anice$

To load this raw data above, I then defined the table like this in HQL:

create table my_test(
userid          BIGINT,
movieId         BIGINT,
comment         STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE;

My problem is that when I "SELECT * FROM my_test;" , I got this:

NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL

I then replace "^A" with "^" in mytest.txt, and also re-defined my table structure by using :
FIELDS TERMINATED BY '^'

So when I select all, I got the correct results.

Any thoughts??? Thanks in advance.


Best regards,
Jack



Re: problem with delimiters (control A)

Posted by Sanjay Subramanian <sa...@yahoo.com>.
Hi Jack 

Since u already have your data with columns separated by CtrlA then u need to define the HIVE table as follows 
(by default Hive will assume CtrlA as column delimiter)

create table if not exists my_test(
    userid          BIGINT,
    movieId         BIGINT,
    comment         STRING
);


regards

sanjay


From: Petter von Dolwitz (Hem) <pe...@gmail.com>
To: "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Thursday, May 22, 2014 8:00 AM
Subject: Re: problem with delimiters (control A)
 


Hi Jack,

could it be that your source file does not use \001 as delimiter but the character sequence "^A"? I get this feeling when looking at your cat printout and also when you say you can do a search and replace. If I do a print out of a file with the same delimiter I get the following

2014-04-097846478510
2014-04-107851178558

These are actually three columns but you cannot see the ^A because it is not a printable character. Check your file in a hex editor or something to verify that you have the correct delimiter.

Br,
Petter













2014-05-18 6:40 GMT+02:00 Jack Yang <ji...@uow.edu.au>:

Hi All, 
>I have a local file called mytest.txt (restored in hdfs already). The content is like this:
>$ cat -A  HDFSLOAD_DIR/mytest.txt
>49139801^A25752451^Aunknown$
>49139801^A24751754^Aunknown$
>49139801^A2161696^Anice$
> 
>To load this raw data above, I then defined the table like this in HQL:
> 
>create table my_test(
>userid          BIGINT,
>movieId         BIGINT,
>comment         STRING
>) 
>ROW FORMAT DELIMITED 
>FIELDS TERMINATED BY '\001'
>STORED AS TEXTFILE;
> 
>My problem is that when I “SELECT * FROM my_test;” , I got this:
> 
>NULL    NULL    NULL
>NULL    NULL    NULL
>NULL    NULL    NULL
> 
>I then replace “^A” with “^” in mytest.txt, and also re-defined my table structure by using :
>FIELDS TERMINATED BY '^’
> 
>So when I select all, I got the correct results.
> 
>Any thoughts??? Thanks in advance.
> 
> 
>Best regards,
>Jack 
> 
> 

Re: problem with delimiters (control A)

Posted by "Petter von Dolwitz (Hem)" <pe...@gmail.com>.
Hi Jack,

could it be that your source file does not use \001 as delimiter but the
character sequence "^A"? I get this feeling when looking at your cat
printout and also when you say you can do a search and replace. If I do a
print out of a file with the same delimiter I get the following

2014-04-097846478510
2014-04-107851178558

These are actually three columns but you cannot see the ^A because it is
not a printable character. Check your file in a hex editor or something to
verify that you have the correct delimiter.

Br,
Petter











2014-05-18 6:40 GMT+02:00 Jack Yang <ji...@uow.edu.au>:

> Hi All,
>
> I have a local file called mytest.txt (restored in hdfs already). The
> content is like this:
>
> $ cat -A  HDFSLOAD_DIR/mytest.txt
>
> 49139801^A25752451^Aunknown$
>
> 49139801^A24751754^Aunknown$
>
> 49139801^A2161696^Anice$
>
>
>
> To load this raw data above, I then defined the table like this in HQL:
>
>
>
> create table my_test(
>
> userid          BIGINT,
>
> movieId         BIGINT,
>
> comment         STRING
>
> )
>
> ROW FORMAT DELIMITED
>
> FIELDS TERMINATED BY '\001'
>
> STORED AS TEXTFILE;
>
>
>
> My problem is that when I “SELECT * FROM my_test;” , I got this:
>
>
>
> NULL    NULL    NULL
>
> NULL    NULL    NULL
>
> NULL    NULL    NULL
>
>
>
> I then replace “^A” with “^” in mytest.txt, and also re-defined my table
> structure by using :
>
> FIELDS TERMINATED BY '^’
>
>
>
> So when I select all, I got the correct results.
>
>
>
> Any thoughts??? Thanks in advance.
>
>
>
>
>
> Best regards,
>
> Jack
>
>
>
>
>